SlideShare a Scribd company logo
Introduction to Python
Chen Lin
Chen Lin
clin@brandeis.edu
clin@brandeis.edu
COSI 134a
COSI 134a
Volen 110
Volen 110
Office Hour: Thurs. 3-5
Office Hour: Thurs. 3-5
For More Information?
http://python.org/
- documentation, tutorials, beginners guide, core
distribution, ...
Books include:
 Learning Python by Mark Lutz
 Python Essential Reference by David Beazley
 Python Cookbook, ed. by Martelli, Ravenscroft and
Ascher
 (online at
http://code.activestate.com/recipes/langs/python/)
 http://wiki.python.org/moin/PythonBooks
Python Videos
Python Videos
http://showmedo.com/videotutorials/python
“5 Minute Overview (What Does Python
Look Like?)”
“Introducing the PyDev IDE for Eclipse”
“Linear Algebra with Numpy”
And many more
4 Major Versions of Python
4 Major Versions of Python
“Python” or “CPython” is written in C/C++
- Version 2.7 came out in mid-2010
- Version 3.1.2 came out in early 2010
“Jython” is written in Java for the JVM
“IronPython” is written in C# for the .Net
environment
Go To Website
Development Environments
Development Environments
what IDE to use?
what IDE to use? http://stackoverflow.com/questions/81584
http://stackoverflow.com/questions/81584
1. PyDev with Eclipse
2. Komodo
3. Emacs
4. Vim
5. TextMate
6. Gedit
7. Idle
8. PIDA (Linux)(VIM Based)
9. NotePad++ (Windows)
10.BlueFish (Linux)
Pydev with Eclipse
Pydev with Eclipse
Python Interactive Shell
Python Interactive Shell
% python
% python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
You can type things directly into a running Python session
You can type things directly into a running Python session
>>> 2+3*4
>>> 2+3*4
14
14
>>> name = "Andrew"
>>> name = "Andrew"
>>> name
>>> name
'Andrew'
'Andrew'
>>> print "Hello", name
>>> print "Hello", name
Hello Andrew
Hello Andrew
>>>
>>>
 Background
Background
 Data Types/Structure
Data Types/Structure
 Control flow
Control flow
 File I/O
File I/O
 Modules
Modules
 Class
Class
 NLTK
NLTK
List
List
A compound data type:
A compound data type:
[0]
[0]
[2.3, 4.5]
[2.3, 4.5]
[5, "Hello", "there", 9.8]
[5, "Hello", "there", 9.8]
[]
[]
Use len() to get the length of a list
Use len() to get the length of a list
>>> names = [“Ben", “Chen", “Yaqin"]
>>> names = [“Ben", “Chen", “Yaqin"]
>>> len(names)
>>> len(names)
3
3
Use [ ] to index items in the list
Use [ ] to index items in the list
>>> names[0]
>>> names[0]
‘
‘Ben'
Ben'
>>> names[1]
>>> names[1]
‘
‘Chen'
Chen'
>>> names[2]
>>> names[2]
‘
‘Yaqin'
Yaqin'
>>> names[3]
>>> names[3]
Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <module>
IndexError: list index out of range
IndexError: list index out of range
>>> names[-1]
>>> names[-1]
‘
‘Yaqin'
Yaqin'
>>> names[-2]
>>> names[-2]
‘
‘Chen'
Chen'
>>> names[-3]
>>> names[-3]
‘
‘Ben'
Ben'
[0] is the first item.
[1] is the second item
...
Out of range values
raise an exception
Negative values
go backwards from
the last element.
Strings share many features with lists
Strings share many features with lists
>>> smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles[0]
>>> smiles[0]
'C'
'C'
>>> smiles[1]
>>> smiles[1]
'('
'('
>>> smiles[-1]
>>> smiles[-1]
'O'
'O'
>>> smiles[1:5]
>>> smiles[1:5]
'(=N)'
'(=N)'
>>> smiles[10:-4]
>>> smiles[10:-4]
'C(=O)'
'C(=O)'
Use “slice” notation to
get a substring
String Methods: find, split
String Methods: find, split
smiles = "C(=N)(N)N.C(=O)(O)O"
smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles.find("(O)")
>>> smiles.find("(O)")
15
15
>>> smiles.find(".")
>>> smiles.find(".")
9
9
>>> smiles.find(".", 10)
>>> smiles.find(".", 10)
-1
-1
>>> smiles.split(".")
>>> smiles.split(".")
['C(=N)(N)N', 'C(=O)(O)O']
['C(=N)(N)N', 'C(=O)(O)O']
>>>
>>>
Use “find” to find the
start of a substring.
Start looking at position 10.
Find returns -1 if it couldn’t
find a match.
Split the string into parts
with “.” as the delimiter
String operators: in, not in
String operators: in, not in
if "Br" in “Brother”:
if "Br" in “Brother”:
print "contains brother“
print "contains brother“
email_address = “clin”
email_address = “clin”
if "@" not in email_address:
if "@" not in email_address:
email_address += "@brandeis.edu“
email_address += "@brandeis.edu“
String Method: “strip”, “rstrip”, “lstrip” are ways to
String Method: “strip”, “rstrip”, “lstrip” are ways to
remove whitespace or selected characters
remove whitespace or selected characters
>>> line = " # This is a comment line n"
>>> line = " # This is a comment line n"
>>> line.strip()
>>> line.strip()
'# This is a comment line'
'# This is a comment line'
>>> line.rstrip()
>>> line.rstrip()
' # This is a comment line'
' # This is a comment line'
>>> line.rstrip("n")
>>> line.rstrip("n")
' # This is a comment line '
' # This is a comment line '
>>>
>>>
More String methods
More String methods
email.startswith(“c") endswith(“u”)
email.startswith(“c") endswith(“u”)
True/False
True/False
>>> "%s@brandeis.edu" % "clin"
>>> "%s@brandeis.edu" % "clin"
'clin@brandeis.edu'
'clin@brandeis.edu'
>>> names = [“Ben", “Chen", “Yaqin"]
>>> names = [“Ben", “Chen", “Yaqin"]
>>> ", ".join(names)
>>> ", ".join(names)
‘
‘Ben, Chen, Yaqin‘
Ben, Chen, Yaqin‘
>>> “chen".upper()
>>> “chen".upper()
‘
‘CHEN'
CHEN'
Unexpected things about strings
Unexpected things about strings
>>> s = "andrew"
>>> s = "andrew"
>>> s[0] = "A"
>>> s[0] = "A"
Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item
TypeError: 'str' object does not support item
assignment
assignment
>>> s = "A" + s[1:]
>>> s = "A" + s[1:]
>>> s
>>> s
'Andrew‘
'Andrew‘
Strings are read only
“
“” is for special characters
” is for special characters
n -> newline
n -> newline
t -> tab
t -> tab
 -> backslash
 -> backslash
...
...
But Windows uses backslash for directories!
filename = "M:nickel_projectreactive.smi" # DANGER!
filename = "M:nickel_projectreactive.smi" # Better!
filename = "M:/nickel_project/reactive.smi" # Usually works
Lists are mutable - some useful
Lists are mutable - some useful
methods
methods
>>> ids = ["9pti", "2plv", "1crn"]
>>> ids = ["9pti", "2plv", "1crn"]
>>> ids.append("1alm")
>>> ids.append("1alm")
>>> ids
>>> ids
['9pti', '2plv', '1crn', '1alm']
['9pti', '2plv', '1crn', '1alm']
>>>ids.extend(L)
>>>ids.extend(L)
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
>>> del ids[0]
>>> del ids[0]
>>> ids
>>> ids
['2plv', '1crn', '1alm']
['2plv', '1crn', '1alm']
>>> ids.sort()
>>> ids.sort()
>>> ids
>>> ids
['1alm', '1crn', '2plv']
['1alm', '1crn', '2plv']
>>> ids.reverse()
>>> ids.reverse()
>>> ids
>>> ids
['2plv', '1crn', '1alm']
['2plv', '1crn', '1alm']
>>> ids.insert(0, "9pti")
>>> ids.insert(0, "9pti")
>>> ids
>>> ids
['9pti', '2plv', '1crn', '1alm']
['9pti', '2plv', '1crn', '1alm']
append an element
remove an element
sort by default order
reverse the elements in a list
insert an element at some
specified position.
(Slower than .append())
Tuples:
Tuples: sort of an immutable list
>>> yellow = (255, 255, 0) # r, g, b
>>> yellow = (255, 255, 0) # r, g, b
>>> one = (1,)
>>> one = (1,)
>>> yellow[0]
>>> yellow[0]
>>> yellow[1:]
>>> yellow[1:]
(255, 0)
(255, 0)
>>> yellow[0] = 0
>>> yellow[0] = 0
Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
TypeError: 'tuple' object does not support item assignment
Very common in string interpolation:
>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056)
'Andrew lives in Sweden at latitude 57.7'
zipping lists together
zipping lists together
>>> names
>>> names
['ben', 'chen', 'yaqin']
['ben', 'chen', 'yaqin']
>>> gender =
>>> gender = [0, 0, 1]
[0, 0, 1]
>>> zip(names, gender)
>>> zip(names, gender)
[('ben', 0), ('chen', 0), ('yaqin', 1)]
[('ben', 0), ('chen', 0), ('yaqin', 1)]
Dictionaries
Dictionaries
 Dictionaries are lookup tables.
 They map from a “key” to a “value”.
symbol_to_name = {
"H": "hydrogen",
"He": "helium",
"Li": "lithium",
"C": "carbon",
"O": "oxygen",
"N": "nitrogen"
}
 Duplicate keys are not allowed
 Duplicate values are just fine
Keys can be any immutable value
Keys can be any immutable value
numbers, strings, tuples, frozenset
numbers, strings, tuples, frozenset,
,
not list, dictionary, set, ...
not list, dictionary, set, ...
atomic_number_to_name = {
atomic_number_to_name = {
1: "hydrogen"
1: "hydrogen"
6: "carbon",
6: "carbon",
7: "nitrogen"
7: "nitrogen"
8: "oxygen",
8: "oxygen",
}
}
nobel_prize_winners = {
nobel_prize_winners = {
(1979, "physics"): ["Glashow", "Salam", "Weinberg"],
(1979, "physics"): ["Glashow", "Salam", "Weinberg"],
(1962, "chemistry"): ["Hodgkin"],
(1962, "chemistry"): ["Hodgkin"],
(1984, "biology"): ["McClintock"],
(1984, "biology"): ["McClintock"],
}
}
A set is an unordered collection
with no duplicate elements.
Dictionary
Dictionary
>>> symbol_to_name["C"]
>>> symbol_to_name["C"]
'carbon'
'carbon'
>>> "O" in symbol_to_name, "U" in symbol_to_name
>>> "O" in symbol_to_name, "U" in symbol_to_name
(True, False)
(True, False)
>>> "oxygen" in symbol_to_name
>>> "oxygen" in symbol_to_name
False
False
>>> symbol_to_name["P"]
>>> symbol_to_name["P"]
Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <module>
KeyError: 'P'
KeyError: 'P'
>>> symbol_to_name.get("P", "unknown")
>>> symbol_to_name.get("P", "unknown")
'unknown'
'unknown'
>>> symbol_to_name.get("C", "unknown")
>>> symbol_to_name.get("C", "unknown")
'carbon'
'carbon'
Get the value for a given key
Test if the key exists
(“in” only checks the keys,
not the values.)
[] lookup failures raise an exception.
Use “.get()” if you want
to return a default value.
Some useful dictionary methods
Some useful dictionary methods
>>> symbol_to_name.keys()
>>> symbol_to_name.keys()
['C', 'H', 'O', 'N', 'Li', 'He']
['C', 'H', 'O', 'N', 'Li', 'He']
>>> symbol_to_name.values()
>>> symbol_to_name.values()
['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']
['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']
>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )
>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )
>>> symbol_to_name.items()
>>> symbol_to_name.items()
[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',
[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',
'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]
'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]
>>> del symbol_to_name['C']
>>> del symbol_to_name['C']
>>> symbol_to_name
>>> symbol_to_name
{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}
{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}
 Background
Background
 Data Types/Structure
Data Types/Structure
list, string, tuple, dictionary
list, string, tuple, dictionary
 Control flow
Control flow
 File I/O
File I/O
 Modules
Modules
 Class
Class
 NLTK
NLTK
Control Flow
Control Flow
Things that are False
Things that are False
 The boolean value False
 The numbers 0 (integer), 0.0 (float) and 0j (complex).
 The empty string "".
 The empty list [], empty dictionary {} and empty set set().
Things that are True
Things that are True
 The boolean value True
The boolean value True
 All non-zero numbers.
All non-zero numbers.
 Any string containing at least one character.
Any string containing at least one character.
 A non-empty data structure.
A non-empty data structure.
If
If
>>> smiles = "BrC1=CC=C(C=C1)NN.Cl"
>>> smiles = "BrC1=CC=C(C=C1)NN.Cl"
>>> bool(smiles)
>>> bool(smiles)
True
True
>>> not bool(smiles)
>>> not bool(smiles)
False
False
>>> if not smiles
>>> if not smiles:
:
... print "The SMILES string is empty"
... print "The SMILES string is empty"
...
...
 The “else” case is always optional
Use “elif” to chain subsequent tests
Use “elif” to chain subsequent tests
>>> mode = "absolute"
>>> mode = "absolute"
>>> if mode == "canonical":
>>> if mode == "canonical":
...
... smiles = "canonical"
smiles = "canonical"
... elif mode == "isomeric":
... elif mode == "isomeric":
...
... smiles = "isomeric”
smiles = "isomeric”
...
... elif mode == "absolute":
elif mode == "absolute":
...
... smiles = "absolute"
smiles = "absolute"
... else:
... else:
...
... raise TypeError("unknown mode")
raise TypeError("unknown mode")
...
...
>>> smiles
>>> smiles
' absolute '
' absolute '
>>>
>>>
“raise” is the Python way to raise exceptions
Boolean logic
Boolean logic
Python expressions can have “and”s and
Python expressions can have “and”s and
“or”s:
“or”s:
if (ben
if (ben <=
<= 5 and chen
5 and chen >=
>= 10 or
10 or
chen
chen ==
== 500 and ben
500 and ben !=
!= 5):
5):
print “Ben and Chen“
print “Ben and Chen“
Range Test
Range Test
if (3
if (3 <= Time <=
<= Time <= 5):
5):
print “Office Hour"
print “Office Hour"
For
For
>>> names = [“Ben", “Chen", “Yaqin"]
>>> names = [“Ben", “Chen", “Yaqin"]
>>> for name in names:
>>> for name in names:
...
... print smiles
print smiles
...
...
Ben
Ben
Chen
Chen
Yaqin
Yaqin
Tuple assignment in for loops
Tuple assignment in for loops
data = [ ("C20H20O3", 308.371),
data = [ ("C20H20O3", 308.371),
("C22H20O2", 316.393),
("C22H20O2", 316.393),
("C24H40N4O2", 416.6),
("C24H40N4O2", 416.6),
("C14H25N5O3", 311.38),
("C14H25N5O3", 311.38),
("C15H20O2", 232.3181)]
("C15H20O2", 232.3181)]
for
for (formula, mw)
(formula, mw) in data:
in data:
print "The molecular weight of %s is %s" % (formula, mw)
print "The molecular weight of %s is %s" % (formula, mw)
The molecular weight of C20H20O3 is 308.371
The molecular weight of C20H20O3 is 308.371
The molecular weight of C22H20O2 is 316.393
The molecular weight of C22H20O2 is 316.393
The molecular weight of C24H40N4O2 is 416.6
The molecular weight of C24H40N4O2 is 416.6
The molecular weight of C14H25N5O3 is 311.38
The molecular weight of C14H25N5O3 is 311.38
The molecular weight of C15H20O2 is 232.3181
The molecular weight of C15H20O2 is 232.3181
Break, continue
Break, continue
>>> for value in [3, 1, 4, 1, 5, 9, 2]:
>>> for value in [3, 1, 4, 1, 5, 9, 2]:
...
... print "Checking", value
print "Checking", value
...
... if value > 8:
if value > 8:
...
... print "Exiting for loop"
print "Exiting for loop"
...
... break
break
...
... elif value < 3:
elif value < 3:
...
... print "Ignoring"
print "Ignoring"
...
... continue
continue
...
... print "The square is", value**2
print "The square is", value**2
...
...
Use “break” to stop
Use “break” to stop
the for loop
the for loop
Use “continue” to stop
Use “continue” to stop
processing the current item
processing the current item
Checking 3
Checking 3
The square is 9
The square is 9
Checking 1
Checking 1
Ignoring
Ignoring
Checking 4
Checking 4
The square is 16
The square is 16
Checking 1
Checking 1
Ignoring
Ignoring
Checking 5
Checking 5
The square is 25
The square is 25
Checking 9
Checking 9
Exiting for loop
Exiting for loop
>>>
>>>
Range()
Range()
 “
“range” creates a list of numbers in a specified range
range” creates a list of numbers in a specified range
 range([start,] stop[, step]) -> list of integers
range([start,] stop[, step]) -> list of integers
 When step is given, it specifies the increment (or decrement).
When step is given, it specifies the increment (or decrement).
>>> range(5)
>>> range(5)
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
>>> range(5, 10)
>>> range(5, 10)
[5, 6, 7, 8, 9]
[5, 6, 7, 8, 9]
>>> range(0, 10, 2)
>>> range(0, 10, 2)
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
How to get every second element in a list?
for i in range(0, len(data), 2):
print data[i]
 Background
Background
 Data Types/Structure
Data Types/Structure
 Control flow
Control flow
 File I/O
File I/O
 Modules
Modules
 Class
Class
 NLTK
NLTK
Reading files
Reading files
>>> f = open(“names.txt")
>>> f = open(“names.txt")
>>> f.readline()
>>> f.readline()
'Yaqinn'
'Yaqinn'
Quick Way
Quick Way
>>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst
>>> lst
['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office
['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office
Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',
Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',
'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:
'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:
Tues. 3-5n']
Tues. 3-5n']
Ignore the header?
Ignore the header?
for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):
for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):
if i == 0: continue
if i == 0: continue
print line
print line
Using dictionaries to count
Using dictionaries to count
occurrences
occurrences
>>> for line in open('names.txt'):
>>> for line in open('names.txt'):
...
... name = line.strip()
name = line.strip()
...
... name_count[name] = name_count.get(name,0)+
name_count[name] = name_count.get(name,0)+
1
1
...
...
>>> for (name, count) in name_count.items():
>>> for (name, count) in name_count.items():
...
... print name, count
print name, count
...
...
Chen 3
Chen 3
Ben 3
Ben 3
Yaqin 3
Yaqin 3
File Output
File Output
input_file = open(“in.txt")
input_file = open(“in.txt")
output_file = open(“out.txt", "w")
output_file = open(“out.txt", "w")
for line in input_file:
for line in input_file:
output_file.write(line)
output_file.write(line)
“w” = “write mode”
“a” = “append mode”
“wb” = “write in binary”
“r” = “read mode” (default)
“rb” = “read in binary”
“U” = “read files with Unix
or Windows line endings”
 Background
Background
 Data Types/Structure
Data Types/Structure
 Control flow
Control flow
 File I/O
File I/O
 Modules
Modules
 Class
Class
 NLTK
NLTK
Modules
Modules
When a Python program starts it only has
access to a basic functions and classes.
(“int”, “dict”, “len”, “sum”, “range”, ...)
“Modules” contain additional functionality.
Use “import” to tell Python to load a
module.
>>> import math
>>> import nltk
import the math module
import the math module
>>> import math
>>> import math
>>> math.pi
>>> math.pi
3.1415926535897931
3.1415926535897931
>>> math.cos(0)
>>> math.cos(0)
1.0
1.0
>>> math.cos(math.pi)
>>> math.cos(math.pi)
-1.0
-1.0
>>> dir(math)
>>> dir(math)
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',
'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos',
'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos',
'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod',
'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod',
'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10',
'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10',
'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan',
'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan',
'tanh', 'trunc']
'tanh', 'trunc']
>>> help(math)
>>> help(math)
>>> help(math.cos)
>>> help(math.cos)
“
“import” and “from ... import ...”
import” and “from ... import ...”
>>> import math
>>> import math
math.cos
math.cos
>>> from math import cos, pi
cos
>>> from math import *
 Background
Background
 Data Types/Structure
Data Types/Structure
 Control flow
Control flow
 File I/O
File I/O
 Modules
Modules
 Class
Class
 NLTK
NLTK
Classes
Classes
class ClassName(object):
class ClassName(object):
<statement-1>
<statement-1>
. . .
. . .
<statement-N>
<statement-N>
class MyClass(object):
class MyClass(object):
"""A simple example class"""
"""A simple example class"""
i = 12345
12345
def f(self):
def f(self):
return self.i
return self.i
class DerivedClassName(BaseClassName):
class DerivedClassName(BaseClassName):
<statement-1>
<statement-1>
. . .
. . .
<statement-N>
<statement-N>
 Background
Background
 Data Types/Structure
Data Types/Structure
 Control flow
Control flow
 File I/O
File I/O
 Modules
Modules
 Class
Class
 NLTK
NLTK
http://www.nltk.org/book
http://www.nltk.org/book
NLTK is on berry patch machines!
NLTK is on berry patch machines!
>>>from nltk.book import *
>>>from nltk.book import *
>>> text1
>>> text1
<Text: Moby Dick by Herman Melville 1851>
<Text: Moby Dick by Herman Melville 1851>
>>> text1.name
>>> text1.name
'Moby Dick by Herman Melville 1851'
'Moby Dick by Herman Melville 1851'
>>> text1.concordance("monstrous")
>>> text1.concordance("monstrous")
>>> dir(text1)
>>> dir(text1)
>>> text1.tokens
>>> text1.tokens
>>> text1.index("my")
>>> text1.index("my")
4647
4647
>>> sent2
>>> sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',
'Sussex', '.']
'Sussex', '.']
Classify Text
Classify Text
>>> def gender_features(word):
>>> def gender_features(word):
...
... return {'last_letter': word[-1]}
return {'last_letter': word[-1]}
>>> gender_features('Shrek')
>>> gender_features('Shrek')
{'last_letter': 'k'}
{'last_letter': 'k'}
>>> from nltk.corpus import names
>>> from nltk.corpus import names
>>> import random
>>> import random
>>> names = ([(name, 'male') for name in names.words('male.txt')] +
>>> names = ([(name, 'male') for name in names.words('male.txt')] +
... [(name, 'female') for name in names.words('female.txt')])
... [(name, 'female') for name in names.words('female.txt')])
>>> random.shuffle(names)
>>> random.shuffle(names)
Featurize, train, test, predict
Featurize, train, test, predict
>>> featuresets = [(gender_features(n), g) for (n,g) in names]
>>> featuresets = [(gender_features(n), g) for (n,g) in names]
>>> train_set, test_set = featuresets[500:], featuresets[:500]
>>> train_set, test_set = featuresets[500:], featuresets[:500]
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
>>> print nltk.classify.accuracy(classifier, test_set)
>>> print nltk.classify.accuracy(classifier, test_set)
0.726
0.726
>>> classifier.classify(gender_features('Neo'))
>>> classifier.classify(gender_features('Neo'))
'male'
'male'
from
from nltk
nltk.corpus import
.corpus import reuters
reuters
 Reuters Corpus:
Reuters Corpus:10,788 news
10,788 news
1.3 million words.
1.3 million words.
 Been classified into
Been classified into 90
90 topics
topics
 Grouped into 2 sets, "training" and "test“
Grouped into 2 sets, "training" and "test“
 Categories overlap with each other
Categories overlap with each other
http://nltk.googlecode.com/svn/trunk/doc/
http://nltk.googlecode.com/svn/trunk/doc/
book/ch02.html
book/ch02.html
Reuters
Reuters
>>> from nltk.corpus import reuters
>>> from nltk.corpus import reuters
>>> reuters.fileids()
>>> reuters.fileids()
['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]
['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]
>>> reuters.categories()
>>> reuters.categories()
['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',
['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',
'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-
'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-
oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]
oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]

More Related Content

PPT
Python tutorial
PPT
Python tutorial
PPT
Python tutorial
PPT
Introduction to Python Programming.ppt
PPTX
Python material
PDF
Learn 90% of Python in 90 Minutes
PPT
Python tutorial
PPTX
Python Workshop
Python tutorial
Python tutorial
Python tutorial
Introduction to Python Programming.ppt
Python material
Learn 90% of Python in 90 Minutes
Python tutorial
Python Workshop

Similar to Python programming tutorial for beginners (20)

PPTX
Python chapter 2
PPTX
python chapter 1
PDF
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
PPTX
Python Workshop - Learn Python the Hard Way
PPT
PPTX
Python-Dictionaries.pptx easy way to learn dictionaries
PDF
Becoming a Pythonist
PPTX
Introduction to python programming 1
PDF
Introduction to R
PPTX
P3 2018 python_regexes
PDF
Τα Πολύ Βασικά για την Python
PPTX
Python language data types
PPTX
Python language data types
PPTX
Python language data types
PPTX
Python language data types
PPTX
Python language data types
PPTX
Python language data types
PPTX
Python language data types
PDF
Datatypes in python
PPTX
Python-The programming Language
Python chapter 2
python chapter 1
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Python Workshop - Learn Python the Hard Way
Python-Dictionaries.pptx easy way to learn dictionaries
Becoming a Pythonist
Introduction to python programming 1
Introduction to R
P3 2018 python_regexes
Τα Πολύ Βασικά για την Python
Python language data types
Python language data types
Python language data types
Python language data types
Python language data types
Python language data types
Python language data types
Datatypes in python
Python-The programming Language
Ad

Recently uploaded (20)

PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Pharma ospi slides which help in ospi learning
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Insiders guide to clinical Medicine.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Types and Its function , kingdom of life
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
master seminar digital applications in india
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Basic Mud Logging Guide for educational purpose
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Classroom Observation Tools for Teachers
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
Microbial disease of the cardiovascular and lymphatic systems
Renaissance Architecture: A Journey from Faith to Humanism
102 student loan defaulters named and shamed – Is someone you know on the list?
Pharma ospi slides which help in ospi learning
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Insiders guide to clinical Medicine.pdf
PPH.pptx obstetrics and gynecology in nursing
Cell Types and Its function , kingdom of life
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
master seminar digital applications in india
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Basic Mud Logging Guide for educational purpose
VCE English Exam - Section C Student Revision Booklet
Classroom Observation Tools for Teachers
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Anesthesia in Laparoscopic Surgery in India
2.FourierTransform-ShortQuestionswithAnswers.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Ad

Python programming tutorial for beginners

  • 1. Introduction to Python Chen Lin Chen Lin clin@brandeis.edu clin@brandeis.edu COSI 134a COSI 134a Volen 110 Volen 110 Office Hour: Thurs. 3-5 Office Hour: Thurs. 3-5
  • 2. For More Information? http://python.org/ - documentation, tutorials, beginners guide, core distribution, ... Books include:  Learning Python by Mark Lutz  Python Essential Reference by David Beazley  Python Cookbook, ed. by Martelli, Ravenscroft and Ascher  (online at http://code.activestate.com/recipes/langs/python/)  http://wiki.python.org/moin/PythonBooks
  • 3. Python Videos Python Videos http://showmedo.com/videotutorials/python “5 Minute Overview (What Does Python Look Like?)” “Introducing the PyDev IDE for Eclipse” “Linear Algebra with Numpy” And many more
  • 4. 4 Major Versions of Python 4 Major Versions of Python “Python” or “CPython” is written in C/C++ - Version 2.7 came out in mid-2010 - Version 3.1.2 came out in early 2010 “Jython” is written in Java for the JVM “IronPython” is written in C# for the .Net environment Go To Website
  • 5. Development Environments Development Environments what IDE to use? what IDE to use? http://stackoverflow.com/questions/81584 http://stackoverflow.com/questions/81584 1. PyDev with Eclipse 2. Komodo 3. Emacs 4. Vim 5. TextMate 6. Gedit 7. Idle 8. PIDA (Linux)(VIM Based) 9. NotePad++ (Windows) 10.BlueFish (Linux)
  • 7. Python Interactive Shell Python Interactive Shell % python % python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. Type "help", "copyright", "credits" or "license" for more information. >>> >>> You can type things directly into a running Python session You can type things directly into a running Python session >>> 2+3*4 >>> 2+3*4 14 14 >>> name = "Andrew" >>> name = "Andrew" >>> name >>> name 'Andrew' 'Andrew' >>> print "Hello", name >>> print "Hello", name Hello Andrew Hello Andrew >>> >>>
  • 8.  Background Background  Data Types/Structure Data Types/Structure  Control flow Control flow  File I/O File I/O  Modules Modules  Class Class  NLTK NLTK
  • 9. List List A compound data type: A compound data type: [0] [0] [2.3, 4.5] [2.3, 4.5] [5, "Hello", "there", 9.8] [5, "Hello", "there", 9.8] [] [] Use len() to get the length of a list Use len() to get the length of a list >>> names = [“Ben", “Chen", “Yaqin"] >>> names = [“Ben", “Chen", “Yaqin"] >>> len(names) >>> len(names) 3 3
  • 10. Use [ ] to index items in the list Use [ ] to index items in the list >>> names[0] >>> names[0] ‘ ‘Ben' Ben' >>> names[1] >>> names[1] ‘ ‘Chen' Chen' >>> names[2] >>> names[2] ‘ ‘Yaqin' Yaqin' >>> names[3] >>> names[3] Traceback (most recent call last): Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <module> IndexError: list index out of range IndexError: list index out of range >>> names[-1] >>> names[-1] ‘ ‘Yaqin' Yaqin' >>> names[-2] >>> names[-2] ‘ ‘Chen' Chen' >>> names[-3] >>> names[-3] ‘ ‘Ben' Ben' [0] is the first item. [1] is the second item ... Out of range values raise an exception Negative values go backwards from the last element.
  • 11. Strings share many features with lists Strings share many features with lists >>> smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles[0] >>> smiles[0] 'C' 'C' >>> smiles[1] >>> smiles[1] '(' '(' >>> smiles[-1] >>> smiles[-1] 'O' 'O' >>> smiles[1:5] >>> smiles[1:5] '(=N)' '(=N)' >>> smiles[10:-4] >>> smiles[10:-4] 'C(=O)' 'C(=O)' Use “slice” notation to get a substring
  • 12. String Methods: find, split String Methods: find, split smiles = "C(=N)(N)N.C(=O)(O)O" smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles.find("(O)") >>> smiles.find("(O)") 15 15 >>> smiles.find(".") >>> smiles.find(".") 9 9 >>> smiles.find(".", 10) >>> smiles.find(".", 10) -1 -1 >>> smiles.split(".") >>> smiles.split(".") ['C(=N)(N)N', 'C(=O)(O)O'] ['C(=N)(N)N', 'C(=O)(O)O'] >>> >>> Use “find” to find the start of a substring. Start looking at position 10. Find returns -1 if it couldn’t find a match. Split the string into parts with “.” as the delimiter
  • 13. String operators: in, not in String operators: in, not in if "Br" in “Brother”: if "Br" in “Brother”: print "contains brother“ print "contains brother“ email_address = “clin” email_address = “clin” if "@" not in email_address: if "@" not in email_address: email_address += "@brandeis.edu“ email_address += "@brandeis.edu“
  • 14. String Method: “strip”, “rstrip”, “lstrip” are ways to String Method: “strip”, “rstrip”, “lstrip” are ways to remove whitespace or selected characters remove whitespace or selected characters >>> line = " # This is a comment line n" >>> line = " # This is a comment line n" >>> line.strip() >>> line.strip() '# This is a comment line' '# This is a comment line' >>> line.rstrip() >>> line.rstrip() ' # This is a comment line' ' # This is a comment line' >>> line.rstrip("n") >>> line.rstrip("n") ' # This is a comment line ' ' # This is a comment line ' >>> >>>
  • 15. More String methods More String methods email.startswith(“c") endswith(“u”) email.startswith(“c") endswith(“u”) True/False True/False >>> "%s@brandeis.edu" % "clin" >>> "%s@brandeis.edu" % "clin" 'clin@brandeis.edu' 'clin@brandeis.edu' >>> names = [“Ben", “Chen", “Yaqin"] >>> names = [“Ben", “Chen", “Yaqin"] >>> ", ".join(names) >>> ", ".join(names) ‘ ‘Ben, Chen, Yaqin‘ Ben, Chen, Yaqin‘ >>> “chen".upper() >>> “chen".upper() ‘ ‘CHEN' CHEN'
  • 16. Unexpected things about strings Unexpected things about strings >>> s = "andrew" >>> s = "andrew" >>> s[0] = "A" >>> s[0] = "A" Traceback (most recent call last): Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item TypeError: 'str' object does not support item assignment assignment >>> s = "A" + s[1:] >>> s = "A" + s[1:] >>> s >>> s 'Andrew‘ 'Andrew‘ Strings are read only
  • 17. “ “” is for special characters ” is for special characters n -> newline n -> newline t -> tab t -> tab -> backslash -> backslash ... ... But Windows uses backslash for directories! filename = "M:nickel_projectreactive.smi" # DANGER! filename = "M:nickel_projectreactive.smi" # Better! filename = "M:/nickel_project/reactive.smi" # Usually works
  • 18. Lists are mutable - some useful Lists are mutable - some useful methods methods >>> ids = ["9pti", "2plv", "1crn"] >>> ids = ["9pti", "2plv", "1crn"] >>> ids.append("1alm") >>> ids.append("1alm") >>> ids >>> ids ['9pti', '2plv', '1crn', '1alm'] ['9pti', '2plv', '1crn', '1alm'] >>>ids.extend(L) >>>ids.extend(L) Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L. Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L. >>> del ids[0] >>> del ids[0] >>> ids >>> ids ['2plv', '1crn', '1alm'] ['2plv', '1crn', '1alm'] >>> ids.sort() >>> ids.sort() >>> ids >>> ids ['1alm', '1crn', '2plv'] ['1alm', '1crn', '2plv'] >>> ids.reverse() >>> ids.reverse() >>> ids >>> ids ['2plv', '1crn', '1alm'] ['2plv', '1crn', '1alm'] >>> ids.insert(0, "9pti") >>> ids.insert(0, "9pti") >>> ids >>> ids ['9pti', '2plv', '1crn', '1alm'] ['9pti', '2plv', '1crn', '1alm'] append an element remove an element sort by default order reverse the elements in a list insert an element at some specified position. (Slower than .append())
  • 19. Tuples: Tuples: sort of an immutable list >>> yellow = (255, 255, 0) # r, g, b >>> yellow = (255, 255, 0) # r, g, b >>> one = (1,) >>> one = (1,) >>> yellow[0] >>> yellow[0] >>> yellow[1:] >>> yellow[1:] (255, 0) (255, 0) >>> yellow[0] = 0 >>> yellow[0] = 0 Traceback (most recent call last): Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment TypeError: 'tuple' object does not support item assignment Very common in string interpolation: >>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056) 'Andrew lives in Sweden at latitude 57.7'
  • 20. zipping lists together zipping lists together >>> names >>> names ['ben', 'chen', 'yaqin'] ['ben', 'chen', 'yaqin'] >>> gender = >>> gender = [0, 0, 1] [0, 0, 1] >>> zip(names, gender) >>> zip(names, gender) [('ben', 0), ('chen', 0), ('yaqin', 1)] [('ben', 0), ('chen', 0), ('yaqin', 1)]
  • 21. Dictionaries Dictionaries  Dictionaries are lookup tables.  They map from a “key” to a “value”. symbol_to_name = { "H": "hydrogen", "He": "helium", "Li": "lithium", "C": "carbon", "O": "oxygen", "N": "nitrogen" }  Duplicate keys are not allowed  Duplicate values are just fine
  • 22. Keys can be any immutable value Keys can be any immutable value numbers, strings, tuples, frozenset numbers, strings, tuples, frozenset, , not list, dictionary, set, ... not list, dictionary, set, ... atomic_number_to_name = { atomic_number_to_name = { 1: "hydrogen" 1: "hydrogen" 6: "carbon", 6: "carbon", 7: "nitrogen" 7: "nitrogen" 8: "oxygen", 8: "oxygen", } } nobel_prize_winners = { nobel_prize_winners = { (1979, "physics"): ["Glashow", "Salam", "Weinberg"], (1979, "physics"): ["Glashow", "Salam", "Weinberg"], (1962, "chemistry"): ["Hodgkin"], (1962, "chemistry"): ["Hodgkin"], (1984, "biology"): ["McClintock"], (1984, "biology"): ["McClintock"], } } A set is an unordered collection with no duplicate elements.
  • 23. Dictionary Dictionary >>> symbol_to_name["C"] >>> symbol_to_name["C"] 'carbon' 'carbon' >>> "O" in symbol_to_name, "U" in symbol_to_name >>> "O" in symbol_to_name, "U" in symbol_to_name (True, False) (True, False) >>> "oxygen" in symbol_to_name >>> "oxygen" in symbol_to_name False False >>> symbol_to_name["P"] >>> symbol_to_name["P"] Traceback (most recent call last): Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <module> KeyError: 'P' KeyError: 'P' >>> symbol_to_name.get("P", "unknown") >>> symbol_to_name.get("P", "unknown") 'unknown' 'unknown' >>> symbol_to_name.get("C", "unknown") >>> symbol_to_name.get("C", "unknown") 'carbon' 'carbon' Get the value for a given key Test if the key exists (“in” only checks the keys, not the values.) [] lookup failures raise an exception. Use “.get()” if you want to return a default value.
  • 24. Some useful dictionary methods Some useful dictionary methods >>> symbol_to_name.keys() >>> symbol_to_name.keys() ['C', 'H', 'O', 'N', 'Li', 'He'] ['C', 'H', 'O', 'N', 'Li', 'He'] >>> symbol_to_name.values() >>> symbol_to_name.values() ['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium'] ['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium'] >>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} ) >>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} ) >>> symbol_to_name.items() >>> symbol_to_name.items() [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P', [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')] 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')] >>> del symbol_to_name['C'] >>> del symbol_to_name['C'] >>> symbol_to_name >>> symbol_to_name {'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'} {'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}
  • 25.  Background Background  Data Types/Structure Data Types/Structure list, string, tuple, dictionary list, string, tuple, dictionary  Control flow Control flow  File I/O File I/O  Modules Modules  Class Class  NLTK NLTK
  • 26. Control Flow Control Flow Things that are False Things that are False  The boolean value False  The numbers 0 (integer), 0.0 (float) and 0j (complex).  The empty string "".  The empty list [], empty dictionary {} and empty set set(). Things that are True Things that are True  The boolean value True The boolean value True  All non-zero numbers. All non-zero numbers.  Any string containing at least one character. Any string containing at least one character.  A non-empty data structure. A non-empty data structure.
  • 27. If If >>> smiles = "BrC1=CC=C(C=C1)NN.Cl" >>> smiles = "BrC1=CC=C(C=C1)NN.Cl" >>> bool(smiles) >>> bool(smiles) True True >>> not bool(smiles) >>> not bool(smiles) False False >>> if not smiles >>> if not smiles: : ... print "The SMILES string is empty" ... print "The SMILES string is empty" ... ...  The “else” case is always optional
  • 28. Use “elif” to chain subsequent tests Use “elif” to chain subsequent tests >>> mode = "absolute" >>> mode = "absolute" >>> if mode == "canonical": >>> if mode == "canonical": ... ... smiles = "canonical" smiles = "canonical" ... elif mode == "isomeric": ... elif mode == "isomeric": ... ... smiles = "isomeric” smiles = "isomeric” ... ... elif mode == "absolute": elif mode == "absolute": ... ... smiles = "absolute" smiles = "absolute" ... else: ... else: ... ... raise TypeError("unknown mode") raise TypeError("unknown mode") ... ... >>> smiles >>> smiles ' absolute ' ' absolute ' >>> >>> “raise” is the Python way to raise exceptions
  • 29. Boolean logic Boolean logic Python expressions can have “and”s and Python expressions can have “and”s and “or”s: “or”s: if (ben if (ben <= <= 5 and chen 5 and chen >= >= 10 or 10 or chen chen == == 500 and ben 500 and ben != != 5): 5): print “Ben and Chen“ print “Ben and Chen“
  • 30. Range Test Range Test if (3 if (3 <= Time <= <= Time <= 5): 5): print “Office Hour" print “Office Hour"
  • 31. For For >>> names = [“Ben", “Chen", “Yaqin"] >>> names = [“Ben", “Chen", “Yaqin"] >>> for name in names: >>> for name in names: ... ... print smiles print smiles ... ... Ben Ben Chen Chen Yaqin Yaqin
  • 32. Tuple assignment in for loops Tuple assignment in for loops data = [ ("C20H20O3", 308.371), data = [ ("C20H20O3", 308.371), ("C22H20O2", 316.393), ("C22H20O2", 316.393), ("C24H40N4O2", 416.6), ("C24H40N4O2", 416.6), ("C14H25N5O3", 311.38), ("C14H25N5O3", 311.38), ("C15H20O2", 232.3181)] ("C15H20O2", 232.3181)] for for (formula, mw) (formula, mw) in data: in data: print "The molecular weight of %s is %s" % (formula, mw) print "The molecular weight of %s is %s" % (formula, mw) The molecular weight of C20H20O3 is 308.371 The molecular weight of C20H20O3 is 308.371 The molecular weight of C22H20O2 is 316.393 The molecular weight of C22H20O2 is 316.393 The molecular weight of C24H40N4O2 is 416.6 The molecular weight of C24H40N4O2 is 416.6 The molecular weight of C14H25N5O3 is 311.38 The molecular weight of C14H25N5O3 is 311.38 The molecular weight of C15H20O2 is 232.3181 The molecular weight of C15H20O2 is 232.3181
  • 33. Break, continue Break, continue >>> for value in [3, 1, 4, 1, 5, 9, 2]: >>> for value in [3, 1, 4, 1, 5, 9, 2]: ... ... print "Checking", value print "Checking", value ... ... if value > 8: if value > 8: ... ... print "Exiting for loop" print "Exiting for loop" ... ... break break ... ... elif value < 3: elif value < 3: ... ... print "Ignoring" print "Ignoring" ... ... continue continue ... ... print "The square is", value**2 print "The square is", value**2 ... ... Use “break” to stop Use “break” to stop the for loop the for loop Use “continue” to stop Use “continue” to stop processing the current item processing the current item Checking 3 Checking 3 The square is 9 The square is 9 Checking 1 Checking 1 Ignoring Ignoring Checking 4 Checking 4 The square is 16 The square is 16 Checking 1 Checking 1 Ignoring Ignoring Checking 5 Checking 5 The square is 25 The square is 25 Checking 9 Checking 9 Exiting for loop Exiting for loop >>> >>>
  • 34. Range() Range()  “ “range” creates a list of numbers in a specified range range” creates a list of numbers in a specified range  range([start,] stop[, step]) -> list of integers range([start,] stop[, step]) -> list of integers  When step is given, it specifies the increment (or decrement). When step is given, it specifies the increment (or decrement). >>> range(5) >>> range(5) [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] >>> range(5, 10) >>> range(5, 10) [5, 6, 7, 8, 9] [5, 6, 7, 8, 9] >>> range(0, 10, 2) >>> range(0, 10, 2) [0, 2, 4, 6, 8] [0, 2, 4, 6, 8] How to get every second element in a list? for i in range(0, len(data), 2): print data[i]
  • 35.  Background Background  Data Types/Structure Data Types/Structure  Control flow Control flow  File I/O File I/O  Modules Modules  Class Class  NLTK NLTK
  • 36. Reading files Reading files >>> f = open(“names.txt") >>> f = open(“names.txt") >>> f.readline() >>> f.readline() 'Yaqinn' 'Yaqinn'
  • 37. Quick Way Quick Way >>> lst= [ x for x in open("text.txt","r").readlines() ] >>> lst= [ x for x in open("text.txt","r").readlines() ] >>> lst >>> lst ['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office ['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn', Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn', 'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour: 'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour: Tues. 3-5n'] Tues. 3-5n'] Ignore the header? Ignore the header? for (i,line) in enumerate(open(‘text.txt’,"r").readlines()): for (i,line) in enumerate(open(‘text.txt’,"r").readlines()): if i == 0: continue if i == 0: continue print line print line
  • 38. Using dictionaries to count Using dictionaries to count occurrences occurrences >>> for line in open('names.txt'): >>> for line in open('names.txt'): ... ... name = line.strip() name = line.strip() ... ... name_count[name] = name_count.get(name,0)+ name_count[name] = name_count.get(name,0)+ 1 1 ... ... >>> for (name, count) in name_count.items(): >>> for (name, count) in name_count.items(): ... ... print name, count print name, count ... ... Chen 3 Chen 3 Ben 3 Ben 3 Yaqin 3 Yaqin 3
  • 39. File Output File Output input_file = open(“in.txt") input_file = open(“in.txt") output_file = open(“out.txt", "w") output_file = open(“out.txt", "w") for line in input_file: for line in input_file: output_file.write(line) output_file.write(line) “w” = “write mode” “a” = “append mode” “wb” = “write in binary” “r” = “read mode” (default) “rb” = “read in binary” “U” = “read files with Unix or Windows line endings”
  • 40.  Background Background  Data Types/Structure Data Types/Structure  Control flow Control flow  File I/O File I/O  Modules Modules  Class Class  NLTK NLTK
  • 41. Modules Modules When a Python program starts it only has access to a basic functions and classes. (“int”, “dict”, “len”, “sum”, “range”, ...) “Modules” contain additional functionality. Use “import” to tell Python to load a module. >>> import math >>> import nltk
  • 42. import the math module import the math module >>> import math >>> import math >>> math.pi >>> math.pi 3.1415926535897931 3.1415926535897931 >>> math.cos(0) >>> math.cos(0) 1.0 1.0 >>> math.cos(math.pi) >>> math.cos(math.pi) -1.0 -1.0 >>> dir(math) >>> dir(math) ['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', ['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10', 'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc'] 'tanh', 'trunc'] >>> help(math) >>> help(math) >>> help(math.cos) >>> help(math.cos)
  • 43. “ “import” and “from ... import ...” import” and “from ... import ...” >>> import math >>> import math math.cos math.cos >>> from math import cos, pi cos >>> from math import *
  • 44.  Background Background  Data Types/Structure Data Types/Structure  Control flow Control flow  File I/O File I/O  Modules Modules  Class Class  NLTK NLTK
  • 45. Classes Classes class ClassName(object): class ClassName(object): <statement-1> <statement-1> . . . . . . <statement-N> <statement-N> class MyClass(object): class MyClass(object): """A simple example class""" """A simple example class""" i = 12345 12345 def f(self): def f(self): return self.i return self.i class DerivedClassName(BaseClassName): class DerivedClassName(BaseClassName): <statement-1> <statement-1> . . . . . . <statement-N> <statement-N>
  • 46.  Background Background  Data Types/Structure Data Types/Structure  Control flow Control flow  File I/O File I/O  Modules Modules  Class Class  NLTK NLTK
  • 47. http://www.nltk.org/book http://www.nltk.org/book NLTK is on berry patch machines! NLTK is on berry patch machines! >>>from nltk.book import * >>>from nltk.book import * >>> text1 >>> text1 <Text: Moby Dick by Herman Melville 1851> <Text: Moby Dick by Herman Melville 1851> >>> text1.name >>> text1.name 'Moby Dick by Herman Melville 1851' 'Moby Dick by Herman Melville 1851' >>> text1.concordance("monstrous") >>> text1.concordance("monstrous") >>> dir(text1) >>> dir(text1) >>> text1.tokens >>> text1.tokens >>> text1.index("my") >>> text1.index("my") 4647 4647 >>> sent2 >>> sent2 ['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', ['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.'] 'Sussex', '.']
  • 48. Classify Text Classify Text >>> def gender_features(word): >>> def gender_features(word): ... ... return {'last_letter': word[-1]} return {'last_letter': word[-1]} >>> gender_features('Shrek') >>> gender_features('Shrek') {'last_letter': 'k'} {'last_letter': 'k'} >>> from nltk.corpus import names >>> from nltk.corpus import names >>> import random >>> import random >>> names = ([(name, 'male') for name in names.words('male.txt')] + >>> names = ([(name, 'male') for name in names.words('male.txt')] + ... [(name, 'female') for name in names.words('female.txt')]) ... [(name, 'female') for name in names.words('female.txt')]) >>> random.shuffle(names) >>> random.shuffle(names)
  • 49. Featurize, train, test, predict Featurize, train, test, predict >>> featuresets = [(gender_features(n), g) for (n,g) in names] >>> featuresets = [(gender_features(n), g) for (n,g) in names] >>> train_set, test_set = featuresets[500:], featuresets[:500] >>> train_set, test_set = featuresets[500:], featuresets[:500] >>> classifier = nltk.NaiveBayesClassifier.train(train_set) >>> classifier = nltk.NaiveBayesClassifier.train(train_set) >>> print nltk.classify.accuracy(classifier, test_set) >>> print nltk.classify.accuracy(classifier, test_set) 0.726 0.726 >>> classifier.classify(gender_features('Neo')) >>> classifier.classify(gender_features('Neo')) 'male' 'male'
  • 50. from from nltk nltk.corpus import .corpus import reuters reuters  Reuters Corpus: Reuters Corpus:10,788 news 10,788 news 1.3 million words. 1.3 million words.  Been classified into Been classified into 90 90 topics topics  Grouped into 2 sets, "training" and "test“ Grouped into 2 sets, "training" and "test“  Categories overlap with each other Categories overlap with each other http://nltk.googlecode.com/svn/trunk/doc/ http://nltk.googlecode.com/svn/trunk/doc/ book/ch02.html book/ch02.html
  • 51. Reuters Reuters >>> from nltk.corpus import reuters >>> from nltk.corpus import reuters >>> reuters.fileids() >>> reuters.fileids() ['test/14826', 'test/14828', 'test/14829', 'test/14832', ...] ['test/14826', 'test/14828', 'test/14829', 'test/14832', ...] >>> reuters.categories() >>> reuters.categories() ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut', ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut', 'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton- 'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton- oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...] oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]