附件PDF:

使用了嵌入文件

入门py翻译英文+格式 中字PPT大纲70p.pdf

文件内可看到格式↑

大纲内容如下(涵盖了PPT的大部分文字)

英文

目录

01-Intro

02-Expressions

03-Conditional

04-Functions

05-Iterations

06-Strings

07-Files

08-Lists

09-Dictionaries

10-Tuples

11-Regex

12-HTTP

13-WebServices

14-Objects

15-Databases

16-Data-Viz

Why Program?

Chapter 1

Computers Want to be Helpful…

• Computers are built for one purpose - to do things for us

• But we need to speak their language to describe what we want done

• Users have it easy - someone already put many different programs (instructions) into the computer and users just pick the ones they want to use

Programmers Anticipate Needs

• iPhone applications are a market

• iPhone applications have over 3 billion downloads

• Programmers have left their jobs to be full-time iPhone developers

• Programmers know the ways of the program

Users vs. Programmers

• Users see computers as a set of tools - word processor, spreadsheet, map, to-do list, etc.

• Programmers learn the computer “ways” and the computer language

• Programmers have some tools that allow them to build new tools

• Programmers sometimes write tools for lots of users and sometimes programmers write little “helpers” for themselves to automate a task

Why be a Programmer?

• To get some task done - we are the user and programmer

  • Clean up survey data

• To produce something for others to use - a programming job

  • Fix a performance problem in the Sakai software

  • Add a guestbook to a web site

What is Code? Software? A Program?

• A sequence of stored instructions

  • It is a little piece of our intelligence in the computer

  • We figure something out and then we encode it and then give it to someone else to save them the time and energy of figuring it out

• A piece of creative art - particularly when we do a good job on user experience

Programs for Humans…

Programs for Humans…

Programs for Humans…

Programs for Humans…

Programs for Python…

Programs for Python…

Hardware Architecture

Definitions

• Central Processing Unit: Runs the Program - The CPU is
always wondering “what to do next”. Not the brains
exactly - very dumb but very very fast

• Input Devices: Keyboard, Mouse, Touch Screen

• Output Devices: Screen, Speakers, Printer, DVD Burner

• Main Memory: Fast small temporary storage - lost on reboot - aka RAM

• Secondary Memory: Slower large permanent storage - lasts until deleted - disk drive / memory stick

Totally Hot CPU

Hard Disk in Action

Python as a Language

Early Learner: Syntax Errors

• We need to learn the Python language so we can communicate our instructions to Python. In the beginning we will make lots of mistakes and speak gibberish like small children.

• When you make a mistake, the computer does not think you are “cute”. It says “syntax error” - given that it knows the language and you are just learning it. It seems like Python is cruel and unfeeling.

• You must remember that you are intelligent and can learn. The computer is simple and very fast, but cannot learn. So it is easier for you to learn Python than for the computer to learn English…

Talking to Python

What Do We Say?

Elements of Python

• Vocabulary / Words - Variables and Reserved words (Chapter 2)

• Sentence structure - valid syntax patterns (Chapters 3-5)

• Story structure - constructing a program for a purpose

Reserved Words

You cannot use reserved words as variable names / identifiers

Sentences or Lines

Programming Paragraphs

Python Scripts

• Interactive Python is good for experiments and programs of 3-4 lines long.

• Most programs are much longer, so we type them into a file and tell Python to run the commands in the file.

• In a sense, we are “giving Python a script”.

• As a convention, we add “.py” as the suffix on the end of these files to indicate they contain Python.

Interactive versus Script

• Interactive

  • You type directly to Python one line at a time and it responds

• Script

  • You enter a sequence of statements (lines) into a file using a text editor and tell Python to execute the statements in the file

Program Steps or Program Flow

• Like a recipe or installation instructions, a program is a sequence of steps to be done in order.

• Some steps are conditional - they may be skipped.

• Sometimes a step or group of steps is to be repeated.

• Sometimes we store a set of steps to be used over and over as needed several places throughout the program (Chapter 4).

Sequential Steps

Conditional Steps

Repeated Steps

Summary

• This is a quick overview of Chapter 1

• We will revisit these concepts throughout the course

• Focus on the big picture

Acknowledgements / Contributions

Variables, Expressions, and Statements

Chapter 2

Constants

• Fixed values such as numbers, letters, and strings, are called “constants” because their value does not change

• Numeric constants are as you expect

• String constants use single quotes (‘)
or double quotes (“)

Reserved Words

You cannot use reserved words as variable names / identifiers

Variables

• A variable is a named place in the memory where a programmer can store data and later retrieve the data using the variable “name”

• Programmers get to choose the names of the variables

• You can change the contents of a variable in a later statement

Variables

• A variable is a named place in the memory where a programmer can store data and later retrieve the data using the variable “name”

• Programmers get to choose the names of the variables

• You can change the contents of a variable in a later statement

Python Variable Name Rules

Must start with a letter or underscore _

Must consist of letters, numbers, and underscores

Case Sensitive

Mnemonic Variable Names

• Since we programmers are given a choice in how we choose our variable names, there is a bit of “best practice”

• We name variables to help us remember what we intend to store in them (“mnemonic” = “memory aid”)

• This can confuse beginning students because well-named variables often “sound” so good that they must be keywords

Sentences or Lines

Assignment Statements

We assign a value to a variable using the assignment statement (=)

An assignment statement consists of an expression on the
right-hand side and a variable to store the result

Expressions…

Numeric Expressions

• Because of the lack of mathematical symbols on computer keyboards - we use “computer-speak” to express the classic math operations

• Asterisk is multiplication

• Exponentiation (raise to a power) looks different than in math

Numeric Expressions

Order of Evaluation

• When we string operators together - Python must know which one to do first

• This is called “operator precedence”

• Which operator “takes precedence” over the others?

Operator Precedence Rules

Highest precedence rule to lowest precedence rule:

Parentheses are always respected

Exponentiation (raise to a power)

Multiplication, Division, and Remainder

Addition and Subtraction

Left to right

Operator Precedence

• Remember the rules top to bottom

• When writing code - use parentheses

• When writing code - keep mathematical expressions simple enough that they are easy to understand

• Break long series of mathematical operations up to make them more clear

What Does “Type” Mean?

• In Python variables, literals, and constants have a “type”

• Python knows the difference between an integer number and a string

• For example “+” means “addition” if something is a number and “concatenate” if something is a string

Type Matters

• Python knows what “type” everything is

• Some operations are prohibited

• You cannot “add 1” to a string

• We can ask Python what type something is by using the type() function

Several Types of Numbers

• Numbers have two main types

  • Integers are whole numbers:
    -14, -2, 0, 1, 100, 401233

  • Floating Point Numbers have decimal parts: -2.5 , 0.0, 98.6, 14.0

• There are other number types - they are variations on float and integer

Type Conversions

• When you put an integer and floating point in an expression, the integer is implicitly converted to a float

• You can control this with the built-in functions int() and float()

Integer Division

Integer division produces a floating point result

String Conversions

• You can also use int() and float() to convert between strings and integers

• You will get an error if the string does not contain numeric characters

User Input

• We can instruct Python to pause and read data from the user using the input() function

• The input() function returns a string

Converting User Input

• If we want to read a number from the user, we must convert it from a string to a number using a type conversion function

• Later we will deal with bad input data

Comments in Python

• Anything after a # is ignored by Python

• Why comment?

  • Describe what is going to happen in a sequence of code

  • Document who wrote the code or other ancillary information

  • Turn off a line of code - perhaps temporarily

Summary

• Type

• Reserved words

• Variables (mnemonic)

• Operators

• Operator precedence

• Integer Division

• Conversion between types

• User input

• Comments (#)

Acknowledgements / Contributions

Conditional Execution

Chapter 3

Conditional Steps

Comparison Operators

• Boolean expressions ask a question and produce a Yes or No result which we use to control program flow

• Boolean expressions using comparison operators evaluate to True / False or Yes / No

• Comparison operators look at variables but do not change the variables

Comparison Operators

One-Way Decisions

Indentation

• Increase indent indent after an if statement or for statement (after : )

• Maintain indent to indicate the scope of the block (which lines are affected by the if/for)

• Reduce indent back to the level of the if statement or for statement to indicate the end of the block

• Blank lines are ignored - they do not affect indentation

• Comments on a line by themselves are ignored with regard to indentation

Two-way Decisions

• Sometimes we want to do one thing if a logical expression is true and something else if the expression is false

• It is like a fork in the road - we must choose one or the other path but not both

Two-way Decisions with else:

Visualize Blocks

More Conditional Structures…

Multi-way

Multi-way

Multi-way

Multi-way

Multi-way

Multi-way Puzzles

The try / except Structure

• You surround a dangerous section of code with try and except

• If the code in the try works - the except is skipped

• If the code in the try fails - it jumps to the except section

try / except

Sample try / except

Summary

• Comparison operators
== <= >= > < !=

• Indentation

• One-way Decisions

• Two-way decisions:
if: and else:

• Nested Decisions

• Multi-way decisions using elif

• try / except to compensate for errors

Acknowledgements / Contributions

Functions

Chapter 4

Stored (and reused) Steps

Python Functions

• There are two kinds of functions in Python.

  • Built-in functions that are provided as part of Python - print(), input(), type(), float(), int() …

  • Functions that we define ourselves and then use

• We treat function names as “new” reserved words
(i.e., we avoid them as variable names)

Function Definition

• In Python a function is some reusable code that takes arguments(s) as input, does some computation, and then returns a result or results

• We define a function using the def reserved word

• We call/invoke the function by using the function name, parentheses, and arguments in an expression

Max Function

Max Function

Type Conversions

• When you put an integer and floating point in an expression, the integer is implicitly converted to a float

• You can control this with the built-in functions int() and float()

String Conversions

• You can also use int() and float() to convert between strings and integers

• You will get an error if the string does not contain numeric characters

Functions of Our Own…

Building our Own Functions

• We create a new function using the def keyword followed by optional parameters in parentheses

• We indent the body of the function

• This defines the function but does not execute the body of the function

Definitions and Uses

• Once we have defined a function, we can call (or invoke) it
as many times as we like

• This is the store and reuse pattern

Arguments

• An argument is a value we pass into the function as its input when we call the function

• We use arguments so we can direct the function to do different kinds of work when we call it at different times

• We put the arguments in parentheses after the name of the function

Parameters

A parameter is a variable which we use in the function definition. It is a “handle” that allows the code in the function to access the arguments for a particular function invocation.

Return Values

Often a function will take its arguments, do some computation, and return a value to be used as the value of the function call in the calling expression. The return keyword is used for this.

Return Value

• A “fruitful” function is one that produces a result (or return value)

• The return statement ends the function execution and “sends back” the result of the function

Arguments, Parameters, and Results

Multiple Parameters / Arguments

• We can define more than one parameter in the function definition

• We simply add more arguments when we call the function

• We match the number and order of arguments and parameters

Void (non-fruitful) Functions

• When a function does not return a value, we call it a “void” function

• Functions that return values are “fruitful” functions

• Void functions are “not fruitful”

To function or not to function…

• Organize your code into “paragraphs” - capture a complete thought and “name it”

• Don’t repeat yourself - make it work once and then reuse it

• If something gets too long or complex, break it up into logical chunks and put those chunks in functions

• Make a library of common stuff that you do over and over - perhaps share this with your friends…

Summary

• Arguments

• Results (fruitful functions)

• Void (non-fruitful) functions

• Why use functions?

• Functions

• Built-In Functions

• Type conversion (int, float)

• String conversions

• Parameters

Acknowledgements / Contributions

Loops and Iteration

Chapter 5

Repeated Steps

An Infinite Loop

Another Loop

Breaking Out of a Loop

• The break statement ends the current loop and jumps to the statement immediately following the loop

• It is like a loop test that can happen anywhere in the body of the loop

Breaking Out of a Loop

• The break statement ends the current loop and jumps to the statement immediately following the loop

• It is like a loop test that can happen anywhere in the body of the loop

Finishing an Iteration with continue

The continue statement ends the current iteration and jumps to the top of the loop and starts the next iteration

Finishing an Iteration with continue

The continue statement ends the current iteration and jumps to the top of the loop and starts the next iteration

Indefinite Loops

• While loops are called “indefinite loops” because they keep going until a logical condition becomes False

• The loops we have seen so far are pretty easy to examine to see if they will terminate or if they will be “infinite loops”

• Sometimes it is a little harder to be sure if a loop will terminate

Definite Loops

Iterating over a set of items…

Definite Loops

• Quite often we have a list of items of the lines in a file - effectively a finite set of things

• We can write a loop to run the loop once for each of the items in a set using the Python for construct

• These loops are called “definite loops” because they execute an exact number of times

• We say that “definite loops iterate through the members of a set”

A Simple Definite Loop

A Definite Loop with Strings

A Simple Definite Loop

Looking at in…

• The iteration variable “iterates” through the sequence (ordered set)

• The block (body) of code is executed once for each value in the sequence

• The iteration variable moves through all of the values in the sequence

Loop Idioms:
What We Do in Loops

Note: Even though these examples are simple, the patterns apply to all kinds of loops

Making “smart” loops

The trick is “knowing” something about the whole loop when you are stuck writing code that only sees one entry at a time

Looping Through a Set

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

What is the Largest Number?

Finding the Largest Value

More Loop Patterns…

Counting in a Loop

Summing in a Loop

Finding the Average in a Loop

Filtering in a Loop

Search Using a Boolean Variable

How to Find the Smallest Value

Finding the Smallest Value

Finding the Smallest Value

Finding the Smallest Value

The is and is not Operators

• Python has an is operator that can be used in logical expressions

• Implies “is the same as”

• Similar to, but stronger than ==

• is not also is a logical operator

Summary

• While loops (indefinite)

• Infinite loops

• Using break

• Using continue

• None constants and variables

• For loops (definite)

• Iteration variables

• Loop idioms

• Largest or smallest

Acknowledgements / Contributions

Reading Files

Chapter 6

String Data Type

• A string is a sequence of characters

• A string literal uses quotes
‘Hello’ or “Hello”

• For strings, + means “concatenate”

• When a string contains numbers, it is still a string

• We can convert numbers in a string into a number using int()

Reading and Converting

• We prefer to read data in using strings and then parse and convert the data as we need

• This gives us more control over error situations and/or bad user input

• Input numbers must be converted from strings

Looking Inside Strings

• We can get at any single character in a string using an index specified in square brackets

• The index value must be an integer and starts at zero

• The index value can be an expression that is computed

A Character Too Far

• You will get a python error if you attempt to index beyond the end of a string

• So be careful when constructing index values and slices

Strings Have Length

The built-in function len gives us the length of a string

len Function

len Function

Looping Through Strings

Using a while statement, an iteration variable, and the len function, we can construct a loop to look at each of the letters in a string individually

Looping Through Strings

• A definite loop using a for statement is much more elegant

• The iteration variable is completely taken care of by the for loop

Looping Through Strings

• A definite loop using a for statement is much more elegant

• The iteration variable is completely taken care of by the for loop

Looping and Counting

This is a simple loop that loops through each letter in a string and counts the number of times the loop encounters the ‘a’ character

Looking Deeper into in

• The iteration variable “iterates” through the sequence (ordered set)

• The block (body) of code is executed once for each value in the sequence

• The iteration variable moves through all of the values in the sequence

More String Operations

Slicing Strings

• We can also look at any continuous section of a string using a colon operator

• The second number is one beyond the end of the slice - “up to but not including”

• If the second number is beyond the end of the string, it stops at the end

Slicing Strings

If we leave off the first number or the last number of the slice, it is assumed to be the beginning or end of the string respectively

String Concatenation

When the + operator is applied to strings, it means “concatenation”

Using in as a Logical Operator

• The in keyword can also be used to check to see if one string is “in” another string

• The in expression is a logical expression that returns True or False and can be used in an if statement

String Comparison

String Library

• Python has a number of string functions which are in the string library

• These functions are already built into every string - we invoke them by appending the function to the string variable

• These functions do not modify the original string, instead they return a new string that has been altered

String Library

Searching a String

• We use the find() function to search for a substring within another string

• find() finds the first occurrence of the substring

• If the substring is not found, find() returns -1

• Remember that string position starts at zero

Making everything UPPER CASE

• You can make a copy of a string in lower case or upper case

• Often when we are searching for a string using find() we first convert the string to lower case so we can search a string regardless of case

Search and Replace

• The replace() function is like a “search and replace” operation in a word processor

• It replaces all occurrences of the search string with the replacement string

Stripping Whitespace

• Sometimes we want to take a string and remove whitespace at the beginning and/or end

• lstrip() and rstrip() remove whitespace at the left or right

• strip() removes both beginning and ending whitespace

Two Kinds of Strings

Summary

• String type

• Read/Convert

• Indexing strings []

• Slicing strings [2:4]

• Looping through strings
with for and while

• Concatenating strings with +

• String operations

• String library

• String comparisons

• Searching in strings

• Replacing text

• Stripping white space

Acknowledgements / Contributions

Reading Files

Chapter 7

File Processing

A text file can be thought of as a sequence of lines

Opening a File

• Before we can read the contents of the file, we must tell Python which file we are going to work with and what we will be doing with the file

• This is done with the open() function

• open() returns a “file handle” - a variable used to perform operations on the file

• Similar to “File -> Open” in a Word Processor

Using open()

handle = open(filename, mode)

returns a handle use to manipulate the file

filename is a string

mode is optional and should be ‘r’ if we are planning to read the file and ‘w’ if we are going to write to the file

What is a Handle?

When Files are Missing

The newline Character

• We use a special character called the “newline” to indicate when a line ends

• We represent it as \n in strings

• Newline is still one character - not two

File Processing

A text file can be thought of as a sequence of lines

File Processing

A text file has newlines at the end of each line

Reading Files in Python

File Handle as a Sequence

• A file handle open for read can be treated as a sequence of strings where each line in the file is a string in the sequence

• We can use the for statement to iterate through a sequence

• Remember - a sequence is an ordered set

Counting Lines in a File

• Open a file read-only

• Use a for loop to read each line

• Count the lines and print out the number of lines

Reading the Whole File

We can read the whole file (newlines and all) into a single string

Searching Through a File

We can put an if statement in our for loop to only print lines that meet some criteria

OOPS!

OOPS!

Each line from the file has a newline at the end

The print statement adds a newline to each line

Searching Through a File (fixed)

We can strip the whitespace from the right-hand side of the string using rstrip() from the string library

The newline is considered “white space” and is stripped

Skipping with continue

We can conveniently skip a line by using the continue statement

Using in to Select Lines

We can look for a string anywhere in a line as our selection criteria

Prompt for File Name

Bad File Names

Summary

• Secondary storage

• Opening a file - file handle

• File structure - newline character

• Reading a file line by line with a
for loop

• Searching for lines

• Reading file names

• Dealing with bad files

Acknowledgements / Contributions

Python Lists

Chapter 8

Programming

Algorithm

  • A set of rules or steps used to solve a problem

Data Structure

  • A particular way of organizing data in a computer

What is Not a “Collection”?

Most of our variables have one value in them - when we put a new value in the variable, the old value is overwritten

A List is a Kind of Collection

• A collection allows us to put many values in a single “variable”

• A collection is nice because we can carry all many values around in one convenient package.

List Constants

List constants are surrounded by square brackets and the elements in the list are separated by commas

A list element can be any Python object - even another list

A list can be empty

We Already Use Lists!

Lists and Definite Loops - Best Pals

Looking Inside Lists

Just like strings, we can get at any single element in a list using an index specified in square brackets

Lists are Mutable

Strings are “immutable” - we cannot change the contents of a string - we must make a new string to make any change

Lists are “mutable” - we can change an element of a list using the index operator

How Long is a List?

The len() function takes a list as a parameter and returns the number of elements in the list

Actually len() tells us the number of elements of any set or sequence (such as a string…)

Using the range Function

The range function returns a list of numbers that range from zero to one less than the parameter

We can construct an index loop using for and an integer iterator

A Tale of Two Loops…

Concatenating Lists Using +

We can create a new list by adding two existing lists together

Lists Can Be Sliced Using :

List Methods

Building a List from Scratch

We can create an empty list and then add elements using the append method

The list stays in order and new elements are added at the end of the list

Is Something in a List?

Python provides two operators that let you check if an item is in a list

These are logical operators that return True or False

They do not modify the list

Lists are in Order

• A list can hold many items and keeps those items in the order until we do something to change the order

• A list can be sorted
(i.e., change its order)

• The sort method (unlike in strings) means “sort yourself”

Built-in Functions and Lists

There are a number of functions built into Python that take lists as parameters

Remember the loops we built? These are much simpler.

Best Friends: Strings and Lists

The Double Split Pattern

Sometimes we split a line one way, and then grab one of the pieces of the line and split that piece again

The Double Split Pattern

The Double Split Pattern

The Double Split Pattern

List Summary

Python Dictionaries

Chapter 9

What is a Collection?

• A collection is nice because we can put more than one value in it and carry them all around in one convenient package

• We have a bunch of values in a single “variable”

• We do this by having more than one place “in” the variable

• We have ways of finding the different places in the variable

What is Not a “Collection”?

Most of our variables have one value in them - when we put a new value in the variable - the old value is overwritten

A Story of Two Collections..

• List

A linear collection of values
Lookup by position 0 .. length-1

• Dictionary

A linear collection of key-value pairs
Lookup by “tag” or “key”

Dictionaries

• Dictionaries are Python’s most powerful data collection

• Dictionaries allow us to do fast database-like operations in Python

• Similar concepts in different programming languages

  • Associative Arrays - Perl / PHP

  • Properties or Map or HashMap - Java

  • Property Bag - C# / .Net

Dictionaries over time in Python

Prior to Python 3.7 dictionaries did not keep entries in the order of insertion

Python 3.7 (2018) and later dictionaries keep entries in the order they were inserted

“insertion order” is not “always sorted order”

Below the Abstraction

Python lists, dictionaries, and tuples are “abstract objects” designed to be easy to use

For now we will just understand them and use them and thank the creators of Python for making them easy for us

Using Python collections is easy. Creating the code to support them is tricky and uses Computer Science concepts like dynamic memory, arrays, linked lists, hash maps and trees.

But that implementation detail is for a later course…

Lists (Review)

• We append values to the end of a List and look them up by position

• We insert values into a Dictionary using a key and retrieve them using a key

Dictionaries

• We append values to the end of a List and look them up by position

• We insert values into a Dictionary using a key and retrieve them using a key

Comparing Lists and Dictionaries

Dictionaries are like lists except that they use keys instead of positions to look up values

Dictionary Literals (Constants)

Dictionary literals use curly braces and have key : value pairs

You can make an empty dictionary using empty curly braces

Most Common Name?

Most Common Name?

Most Common Name?

Many Counters with a Dictionary

One common use of dictionaries is counting how often we “see” something

Dictionary Tracebacks

• It is an error to reference a key which is not in the dictionary

• We can use the in operator to see if a key is in the dictionary

When We See a New Name

When we encounter a new name, we need to add a new entry in the dictionary and if this the second or later time we have seen the name, we simply add one to the count in the dictionary under that name

The get Method for Dictionaries

The pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common that there is a method called get() that does this for us

Simplified Counting with get()

We can use get() and provide a default value of zero when the key is not yet in the dictionary - and then just add one

Counting Pattern

Definite Loops and Dictionaries

We can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionary and looks up the values

Retrieving Lists of Keys and Values

You can get a list of keys, values, or items (both) from a dictionary

Bonus: Two Iteration Variables!

We loop through the key-value pairs in a dictionary using two iteration variables

Each iteration, the first variable is the key and the second variable is the corresponding value for the key

Summary

What is a collection

Lists versus dictionaries

Dictionary Constants

The most common word

Using the get() method

Writing dictionary loops

Sneak peek: Tuples

Acknowledgements / Contributions

Tuples

Chapter 10

Tuples Are Like Lists

Tuples are another kind of sequence that functions much like a list - they have elements which are indexed starting at 0

but… Tuples are “immutable”

Unlike a list, once you create a tuple, you cannot alter its contents - similar to a string

Things not to do With Tuples

A Tale of Two Sequences

Tuples are More Efficient

• Since Python does not have to build tuple structures to be modifiable, they are simpler and more efficient in terms of memory use and performance than lists

• So in our program when we are making “temporary variables” we prefer tuples over lists

Tuples and Assignment

• We can also put a tuple on the left-hand side of an assignment statement

• We can even omit the parentheses

Tuples and Dictionaries

The items() method in dictionaries returns a list of (key, value) tuples

Tuples are Comparable

The comparison operators work with tuples and other sequences. If the first item is equal, Python goes on to the next element, and so on, until it finds elements that differ.

Sorting Lists of Tuples

• We can take advantage of the ability to sort a list of tuples to get a sorted version of a dictionary

• First we sort the dictionary by the key using the items() method and sorted() function

Using sorted()

We can do this even more directly using the built-in function sorted that takes a sequence as a parameter and returns a sorted sequence

Sort by Values Instead of Key

• If we could construct a list of tuples of the form (value, key) we could sort by value

• We do this with a for loop that creates a list of tuples

Even Shorter Version

Summary

• Tuple syntax

• Immutability

• Comparability

• Sorting

• Tuples in assignment statements

• Sorting dictionaries by either key or value

Acknowledgements / Contributions

Regular Expressions

Chapter 11

Regular Expressions

Regular Expressions

Understanding Regular Expressions

• Very powerful and quite cryptic

• Fun once you understand them

• Regular expressions are a language unto themselves

• A language of “marker characters” - programming with characters

• It is kind of an “old school” language - compact

Regular Expression Quick Guide

The Regular Expression Module

• Before you can use regular expressions in your program, you must import the library using “import re”

• You can use re.search() to see if a string matches a regular expression, similar to using the find() method for strings

• You can use re.findall() to extract portions of a string that match your regular expression, similar to a combination of find() and slicing: var[5:10]

Using re.search() Like find()

Using re.search() Like startswith()

Wild-Card Characters

• The dot character matches any character

• If you add the asterisk character, the character is “any number of times”

Fine-Tuning Your Match

Depending on how “clean” your data is and the purpose of your application, you may want to narrow your match down a bit

Fine-Tuning Your Match

Depending on how “clean” your data is and the purpose of your application, you may want to narrow your match down a bit

Matching and Extracting Data

• re.search() returns a True/False depending on whether the string matches the regular expression

• If we actually want the matching strings to be extracted, we use re.findall()

Matching and Extracting Data

When we use re.findall(), it returns a list of zero or more sub-strings that match the regular expression

Warning: Greedy Matching

The repeat characters (* and +) push outward in both directions (greedy) to match the largest possible string

Non-Greedy Matching

Not all regular expression repeat codes are greedy! If you add a ? character, the + and * chill out a bit…

Fine-Tuning String Extraction

You can refine the match for re.findall() and separately determine which portion of the match is to be extracted by using parentheses

Fine-Tuning String Extraction

Parentheses are not part of the match - but they tell where to start and stop what string to extract

String Parsing Examples…

The Double Split Pattern

Sometimes we split a line one way, and then grab one of the pieces of the line and split that piece again

The Regex Version

The Regex Version

The Regex Version

Even Cooler Regex Version

Even Cooler Regex Version

Even Cooler Regex Version

Even Cooler Regex Version

Even Cooler Regex Version

Spam Confidence

Escape Character

If you want a special regular expression character to just behave normally (most of the time) you prefix it with ‘\’

Summary

• Regular expressions are a cryptic but powerful language for matching strings and extracting elements from those strings

• Regular expressions have special characters that indicate intent

Acknowledgements / Contributions

Networked Programs

Chapter 12

A Free Book on Network Architecture

If you find this topic area interesting and/or need more detail

www.net-intro.com

Transport Control Protocol (TCP)

Built on top of IP (Internet Protocol)

Assumes IP might lose some data - stores and retransmits data if it seems to be lost

Handles “flow control” using a transmit window

Provides a nice reliable pipe

TCP Connections / Sockets

TCP Port Numbers

• A port is an application-specific or process-specific software communications endpoint

• It allows multiple networked applications to coexist on the same server

• There is a list of well-known TCP port numbers

Common TCP Ports

Sockets in Python

Python has built-in support for TCP Sockets

Application Protocols

Application Protocol

Since TCP (and Python) gives us a reliable socket, what do we want to do with the socket? What problem do we want to solve?

Application Protocols

  • Mail

  • World Wide Web

HTTP - Hypertext Transfer Protocol

The dominant Application Layer Protocol on the Internet

Invented for the Web - to Retrieve HTML, Images, Documents, etc.

Extended to retrieve data in addition to documents - RSS, Web Services, etc. Basic Concept - Make a Connection - Request a document - Retrieve the Document - Close the Connection

HTTP

The HyperText Transfer Protocol is the set of rules to allow browsers to retrieve web documents from servers over the Internet

What is a Protocol?

A set of rules that all parties follow so we can predict each other’s behavior

And not bump into each other

  • On two-way roads in USA, drive on the right-hand side of the road

  • On two-way roads in the UK, drive on the left-hand side of the road

Getting Data From The Server

Each time the user clicks on an anchor tag with an href= value to switch to a new page, the browser makes a connection to the web server and issues a “GET” request - to GET the content of the page at the specified URL

The server returns the HTML document to the browser, which formats and displays the document to the user

Internet Standards

The standards for all of the Internet protocols (inner workings) are developed by an organization

Internet Engineering Task Force (IETF)

www.ietf.org

Standards are called “RFCs” - “Request for Comments”

Making an HTTP request

Connect to the server like www.dr-chuck.com”

Request a document (or the default document)

GET http://www.dr-chuck.com/page1.htm HTTP/1.0

GET http://www.mlive.com/ann-arbor/ HTTP/1.0

GET http://www.facebook.com HTTP/1.0

Accurate Hacking in the Movies

Matrix Reloaded

Bourne Ultimatum

Die Hard 4

Let’s Write a Web Browser!

An HTTP Request in Python

About Characters and Strings…

ASCII

Representing Simple Strings

Each character is represented by a number between 0 and 256 stored in 8 bits of memory

We refer to “8 bits of memory as a “byte” of memory – (i.e. my disk drive contains 3 Terabytes of memory)

The ord() function tells us the numeric value of a simple ASCII character

ASCII

Multi-Byte Characters

To represent the wide range of characters computers must handle we represent characters with more than one byte

UTF-16 – Fixed length - Two bytes

UTF-32 – Fixed Length - Four Bytes

UTF-8 – 1-4 bytes

  • Upwards compatible with ASCII

  • Automatic detection between ASCII and UTF-8

  • UTF-8 is recommended practice for encoding
    data to be exchanged between systems

Two Kinds of Strings in Python

Python 2 versus Python 3

Python 3 and Unicode

In Python 3, all strings internally are UNICODE

Working with string variables in Python programs and reading data from files usually “just works”

When we talk to a network resource using sockets or talk to a database we have to encode and decode data (usually to UTF-8)

Python Strings to Bytes

When we talk to an external resource like a network socket we send bytes, so we need to encode Python 3 strings into a given character encoding

When we read data from an external resource, we must decode it based on the character set so it is properly represented in Python 3 as a string

An HTTP Request in Python

Making HTTP Easier With urllib

Using urllib in Python

Since HTTP is so common, we have a library that does all the socket work for us and makes web pages look like a file

Like a File…

Reading Web Pages

Following Links

The First Lines of Code @ Google?

Parsing HTML
(a.k.a. Web Scraping)

What is Web Scraping?

• When a program or script pretends to be a browser and retrieves web pages, looks at those web pages, extracts information, and then looks at more web pages

• Search engines scrape web pages - we call this “spidering the web” or “web crawling”

Why Scrape?

• Pull data - particularly social data - who links to who?

• Get your own data back out of some system that has no “export capability”

• Monitor a site for new information

• Spider the web to make a database for a search engine

Scraping Web Pages

• There is some controversy about web page scraping and some sites are a bit snippy about it.

• Republishing copyrighted information is not allowed

• Violating terms of service is not allowed

The Easy Way - Beautiful Soup

• You could do string searches the hard way

• Or use the free software library called BeautifulSoup from www.crummy.com

BeautifulSoup Installation

Summary

The TCP/IP gives us pipes / sockets between applications

We designed application protocols to make use of these pipes

HyperText Transfer Protocol (HTTP) is a simple yet powerful protocol

Python has good support for sockets, HTTP, and HTML parsing

Acknowledgements / Contributions

Using Web Services

Chapter 13

Data on the Web

With the HTTP Request/Response well understood and well supported, there was a natural move toward exchanging data between programs using these protocols

We needed to come up with an agreed way to represent data going between applications and across networks

There are two commonly used formats: XML and JSON

Sending Data Across the Net

Agreeing on a Wire Format

Agreeing on a Wire Format

XML

Marking up data to send across the network…

XML Elements (or Nodes)

• Simple Element

• Complex Element

eXtensible Markup Language

Primary purpose is to help information systems share structured data

It started as a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible

XML Basics

• Start Tag

• End Tag

• Text Content

• Attribute

• Self Closing Tag

White Space

XML Terminology

Tags indicate the beginning and ending of elements

Attributes - Keyword/value pairs on the opening tag of XML

Serialize / De-Serialize - Convert data in one program into a common format that can be stored and/or transmitted between systems in a programming language-independent manner

XML as a Tree

XML Text and Attributes

XML as Paths

XML Schema

Describing a “contract” as to what is acceptable XML

XML Schema

Description of the legal format of an XML document

Expressed in terms of constraints on the structure and content of documents

Often used to specify a “contract” between systems - “My system will only accept XML that conforms to this particular Schema.”

If a particular piece of XML meets the specification of the Schema - it is said to “validate”

Many XML Schema Languages

Document Type Definition (DTD)

Standard Generalized Markup Language (ISO 8879:1986 SGML)

XML Schema from W3C - (XSD)

XSD XML Schema (W3C spec)

We will focus on the World Wide Web Consortium (W3C) version

It is often called “W3C Schema” because “Schema” is considered generic

More commonly it is called XSD because the file names end in .xsd

XSD Structure

xs:element

xs:sequence

xs:complexType

XSD
Constraints

XSD Data Types

ISO 8601 Date/Time Format

JavaScript Object Notation

JavaScript Object Notation

• Douglas Crockford - “Discovered” JSON

• Object literal notation in JavaScript

Service Oriented Approach

Service Oriented Approach

Most non-trivial web applications use services

They use services from other applications

  • Credit Card Charge

  • Hotel Reservation systems

Services publish the “rules” applications must follow to make use of the service (API)

Multiple Systems

Initially - two systems cooperate and split the problem

As the data/service becomes useful - multiple applications want to use the information / application

APIs

There Are Many APIs

There are organizations that put up public APIs and sell access to those APIs

We will explore a geocoding API based on the OpenStreetMap data

You need an account to access this API

There is a free level of requests over time

You pay above that rate of usage

An API Proxy

To avoid making you get an account, I have a well-hidden web server that acts as a proxy for the Geoapify data

This proxy does not require a password – but it does have rate limits and is heavily cached using an edge-caching service for performance

Summary

Service Oriented Architecture - allows an application to be broken into parts and distributed across a network

An Application Program Interface (API) is a contract for interaction

Web Services provide infrastructure for applications cooperating (an API) over a network - SOAP and REST are two styles of web services

XML and JSON are serialization formats

Acknowledgements / Contributions

14

Python Objects

Charles Severance

Warning

This lecture is very much about definitions and mechanics for objects

This lecture is a lot more about “how it works” and less about “how you use it”

You won’t get the entire picture until this is all looked at in the context of a real problem

So please suspend disbelief and learn technique for the next 40 or so slides…

Lets Start with Programs

Object Oriented

• A program is made up of many cooperating objects

• Instead of being the “whole program” - each object is a little “island” within the program and cooperatively working with other objects

• A program is made up of one or more objects working together - objects make use of each other’s capabilities

Object

An Object is a bit of self-contained Code and Data

A key aspect of the Object approach is to break the problem into smaller understandable parts (divide and conquer)

Objects have boundaries that allow us to ignore un-needed detail

We have been using objects all along: String Objects, Integer Objects, Dictionary Objects, List Objects…

Definitions

Class - a template

Method or Message - A defined capability of a class

Field or attribute- A bit of data in a class

Object or Instance - A particular instance of a class

Terminology: Class

Terminology: Instance

Terminology: Method

Some Python Objects

A Sample Class

Playing with dir() and type()

A Nerdy Way to Find Capabilities

The dir() command lists capabilities

Ignore the ones with underscores - these are used by Python itself

The rest are real operations that the object can perform

It is like type() - it tells us something about a variable

Try dir() with a String

Object Lifecycle

http://en.wikipedia.org/wiki/Constructor_(computer_science)

Object Lifecycle

Objects are created, used, and discarded

We have special blocks of code (methods) that get called

  • At the moment of creation (constructor)

  • At the moment of destruction (destructor)

Constructors are used a lot

Destructors are seldom used

Constructor

The primary purpose of the constructor is to set up some instance variables to have the proper initial values when the object is created

Constructor

In object oriented programming, a constructor in a class is a special block of statements called when an object is created

Many Instances

We can create lots of objects - the class is the template for the object

We can store each distinct object in its own variable

We call this having multiple instances of the same class

Each instance has its own copy of the instance variables

Inheritance

http://www.ibiblio.org/g2swap/byteofpython/read/inheritance.html

Inheritance

When we make a new class - we can reuse an existing class and inherit all the capabilities of an existing class and then add our own little bit to make our new class

Another form of store and reuse

Write once - reuse many times

The new class (child) has all the capabilities of the old class (parent) - and then some more

Terminology: Inheritance

Definitions

Class - a template

Attribute – A variable within a class

Method - A function within a class

Object - A particular instance of a class

Constructor – Code that runs when an object is created

Inheritance - The ability to extend a class to make a new class.

Summary

Object Oriented programming is a very structured approach to code reuse

We can group data and functionality together and create many independent instances of a class

Acknowledgements / Contributions

Additional Source Information

• Snowman Cookie Cutter” by Didriks is licensed under CC BY
https://www.flickr.com/photos/dinnerseries/23570475099

• Photo from the television program Lassie. Lassie watches as Jeff (Tommy Rettig) works on his bike is Public Domain
https://en.wikipedia.org/wiki/Lassie#/media/File:Lassie_and_Tommy_Rettig_1956.JPG

15

Relational Databases and SQLite

Charles Severance

SQLite Browser

Random Access

When you can randomly access data…

How can you layout data to be most efficient?

Sorting might not be the best idea

Relational Databases

Terminology

• Database - contains many tables

• Relation (or table) - contains tuples and attributes

• Tuple (or row) - a set of fields that generally represents an “object” like a person or a music track

• Attribute (also column or field) - one of possibly many elements of data corresponding to the object represented by the row

SQL

Structured Query Language is the language we use to issue commands to the database

  • Create data (a.k.a Insert)

  • Retrieve data

  • Update data

  • Delete data

Web Applications w/ Databases

• Application Developer - Builds the logic for the application, the look and feel of the application - monitors the application for problems

• Database Administrator - Monitors and adjusts the database as the program runs in production

• Often both people participate in the building of the “Data model”

Database Administrator

Database Model

Common Database Systems

• Three major Database Management Systems in wide use

  • Oracle - Large, commercial, enterprise-scale, very very tweakable

  • MySql - Simpler but very fast and scalable - commercial open source

  • SqlServer - Very nice - from Microsoft (also Access)

• Many other smaller projects, free and open source

  • HSQL, SQLite, Postgres, …

SQLite is in Lots of Software…

SQLite Browser

• SQLite is a very popular database - it is free and fast and small

• SQLite Browser allows us to directly manipulate SQLite files

http://sqlitebrowser.org/

SQLite is embedded in Python and a number of other languages

Lets Make a Database

https://www.py4e.com/lectures3/Pythonlearn-15-Database-Handout.txt

Start Simple - A Single Table

SQL

Structured Query Language is the language we use to issue commands to the database

  • Create data (a.k.a Insert)

  • Retrieve data

  • Update data

  • Delete data

SQL: Insert

The Insert statement inserts a row into a table

SQL: Delete

Deletes a row in a table based on selection criteria

SQL: Update

Allows the updating of a field with a where clause

Retrieving Records: Select

The select statement retrieves a group of records - you can either retrieve all the records or a subset of the records with a WHERE clause

Sorting with ORDER BY

You can add an ORDER BY clause to SELECT statements to get the results sorted in ascending or descending order

SQL Summary

This is not too exciting (so far)

• Tables pretty much look like big fast programmable spreadsheets with rows, columns, and commands

• The power comes when we have more than one table and we can exploit the relationships between the tables

Complex Data Models and Relationships

http://en.wikipedia.org/wiki/Relational_model

Database Design

• Database design is an art form of its own with particular skills and experience

• Our goal is to avoid the really bad mistakes and design clean and easily understood databases

• Others may performance tune things later

• Database design starts with a picture…

Building a Data Model

• Drawing a picture of the data objects for our application and then figuring out how to represent the objects and their relationships

• Basic Rule: Don’t put the same string data in twice - use a relationship instead

• When there is one thing in the “real world” there should be one copy of that thing in the database

For each “piece of info”…

• Is the column an object or an attribute of another object?

• Once we define objects, we need to define the relationships between objects

Representing Relationships in a Database

Database Normalization (3NF)

There is tons of database theory - way too much to understand without excessive predicate calculus

Do not replicate data - reference data - point at data

Use integers for keys and for references

Add a special “key” column to each table which we will make references to. By convention, many programmers call this column “id”

Integer Reference Pattern

Three Kinds of Keys

• Primary key - generally an integer auto-increment field

• Logical key - What the outside world uses for lookup

• Foreign key - generally an integer key pointing to a row in another table

Key Rules

Best practices

• Never use your logical key as the primary key

• Logical keys can and do change, albeit slowly

• Relationships that are based on matching string fields are less efficient than integers

Foreign Keys

• A foreign key is when a table has a column that contains a key which points to the primary key of another table.

• When all primary keys are integers, then all foreign keys are integers - this is good - very good

Relationship Building (in tables)

Using Join Across Tables

Relational Power

• By removing the replicated data and replacing it with references to a single copy of each bit of data we build a “web” of information that the relational database can read through very quickly - even for very large amounts of data

• Often when you want some data it comes from a number of tables linked by these foreign keys

The JOIN Operation

• The JOIN operation links across several tables as part of a select operation

• You must tell the JOIN how to use the keys that make the connection between the tables using an ON clause

Many-To-Many Relationships

Many to Many

• Sometimes we need to model a relationship that is many-to-many

• We need to add a “connection” table with two foreign keys

• There is usually no separate primary key

Start with a Fresh Database

Insert Users and Courses

Complexity Enables Speed

• Complexity makes speed possible and allows you to get very fast results as the data size grows

• By normalizing the data and linking it with integer keys, the overall amount of data which the relational database must scan is far lower than if the data were simply flattened out

• It might seem like a tradeoff - spend some time designing your database so it continues to be fast when your application is a success

Additional SQL Topics

• Indexes improve access performance for things like string fields

• Constraints on data - (cannot be NULL, etc..)

• Transactions - allow SQL operations to be grouped and done as a unit

Summary

• Relational databases allow us to scale to very large amounts of data

• The key is to have one copy of any data element and use relations and joins to link the data to multiple places

• This greatly reduces the amount of data which much be scanned when doing complex operations across large amounts of data

• Database and SQL design is a bit of an art form

Acknowledgements / Contributions

16

Retrieving and Visualizing Data

Charles Severance

Multi-Step Data Analysis

Many Data Mining Technologies

https://hadoop.apache.org/

http://spark.apache.org/

https://aws.amazon.com/redshift/

http://community.pentaho.com/

….

“Personal Data Mining”

Our goal is to make you better programmers – not to make you data mining experts

OpenGeo

Makes an annotated Open Street Map from user entered data

Uses the proxied GeoAPI API

Caches data in a database to avoid rate limiting and allow restarting

Visualized in a browser using the Open Street Map

Page Rank

Write a simple web page crawler

Compute a simple version of Google’s Page Rank algorithm

Visualize the resulting network

Search Engine Architecture

Web Crawling

Index Building

Searching

Web Crawler

Web Crawler

Retrieve a page

Look through the page for links

Add the links to a list of “to be retrieved” sites

Repeat…

Web Crawling Policy

a selection policy that states which pages to download,

a re-visit policy that states when to check for changes to the pages,

a politeness policy that states how to avoid overloading Web sites, and

a parallelization policy that states how to coordinate distributed Web crawlers

robots.txt

A way for a web site to communicate with web crawlers

An informal and voluntary standard

Sometimes folks make a “Spider Trap” to catch “bad” spiders

Google Architecture

Web Crawling

Index Building

Searching

Search Indexing

Mailing Lists - Gmane

Crawl the archive of a mailing list

Do some analysis / cleanup

Visualize the data as word cloud and lines

Warning: This Dataset is > 1GB

Sadly the original source of this data (gmane.org) has been shut down

We made a copy of a subset of the data before it was shut down

Acknowledgements / Contributions