0:08

Welcome to this week's programming tip.

Â Now, in this specialization,

Â we focused mainly on writing scripts that use good code and up to now,

Â good has meant code that's correct,

Â understandable, and that others can work with easily.

Â However, as you progress,

Â you're going to see that going to be another metric by which we

Â assess the quality of your code and that's it's efficiency.

Â We're really interested in how fast your code runs on a problem of a particular size.

Â If you move on into the larger field of computer science,

Â kind of the core topic in this area are called algorithms and algorithms are

Â common recipes for solving important computational problems.

Â One of the ways we measure how good an algorithm is we talk about it's efficiency,

Â how fast it runs on an input of a given size.

Â In this video, what I want to do is talk a little bit about some of the kind of

Â the core concepts that come up when we're talking about efficiency of algorithms.

Â Then, in a follow up video,

Â we'll talk about a couple of different approaches to joining

Â two CSV files in Python and

Â how this idea of efficiency can kind of sway which approach we take.

Â Let's go do it.

Â Okay. Let's talk a little bit about how you measure the efficiency of your code.

Â There's a whole subject area in computer science that

Â focuses on this and one of the key things is algorithmic efficiency.

Â Your code implements an algorithm,

Â some mathematical recipe for doing something and you want to know how good is it.

Â This is a Wiki page.

Â It gives that a very general overview of things.

Â You might look over just to get some vague ideas of how things work.

Â There are two ways that you can go about measuring the efficiency of your code.

Â This is called benchmarking.

Â This is probably the one that you're most used to and the idea is you take an algorithm,

Â you write it up in some language, say Python,

Â and you run your Python code on some particular machine,

Â and you put in a timing code to see how long a particular portion of

Â the code took to run and you get a number out.

Â What happens is when people want to measure how well things work,

Â they often put what are called benchmarks.

Â They'll go through and take a common problem and they'll take

Â a particular language and they'll run it through

Â their machine and they'll see how long it takes.

Â If you've ever bought a graphics card for one of your computers,

Â you'll often do some benchmarking to see how well

Â your graphics card works on a particular game.

Â Now, that's kind of a nice way to understand a little bit about how fast your code runs,

Â but there are some drawbacks here.

Â The absolute running time for your code depends on a lot of factors

Â beyond just how good your mathematical algorithm is for solving a problem.

Â It depends on how fast your computer is,

Â depends on which language you chose.

Â Did use Java or C or Python?

Â It also depends on the particular implementation of

Â the language that you chose to use here.

Â If you've used Code Skulptor 3, you'll see it runs a lot slower than regular old Python.

Â There's another way to measure the efficiency of your code,

Â the algorithms you're going to use,

Â and that's a theoretical method.

Â What that method does is it's going to instead look at the running time of

Â your code in terms of

Â how fast the running time increases as the size of the input increases.

Â Notice that approach of looking at how fast the running time increases as the size

Â of the input increases is going to wash out a lot of factors,

Â like how fast your machine is.

Â If the machine is fast or slow,

Â you'll see we still see the same growth in the running time.

Â Which language you choose, same thing,

Â you'll see really which language you use doesn't really

Â impact the increase in the running time as you increase the size of the input.

Â Now, I'm not going to go into all the details but the common way

Â we measure this in computer science is we use,

Â it's called Big O notation.

Â It's basically a Big O of some function kind of gives you a function that

Â balance the growth and the running time as the size of the input increases and so,

Â for a few examples here, Big O,

Â you'll find just means that the running time of your codes are

Â constant and if you double the size of the input,

Â the running time doesn't really change.

Â That's a really good property to have.

Â Another common one is Big O, again,

Â it's linear and that means that if you double the size of your input,

Â the running time of your code doubles. Pretty good.

Â It means basically your running time is proportional to the amount of input you have.

Â That's a very reasonable thing.

Â Something a little slower,

Â quadratic, that's Big O squared.

Â That means if you double the size of the input,

Â then the running time goes up by a factor of four.

Â It's not the end of the world,

Â but as you keep doubling,

Â things get a lot slower,

Â so it's not great.

Â These are three examples of some common running times this much.

Â You see Big O notation in here, Big O(1), great.

Â Big O of pretty good, Big-O(1) squared.

Â It's okay but not great.

Â Okay, let's move on now and we'll talk a little bit about how Python is implemented.

Â You want to use this Big O notation to understand the efficiency of

Â some operations that we commonly use inside Python.

Â Let's move on now to Python and

Â try to understand the efficiency of the code that you've written in Python.

Â There's a very helpful page that you should be

Â aware of that sits out on python.org which is called

Â the Time Complexity page which talks about some of

Â the specifics of how fast certain operations run inside Python.

Â Now, when you first started working with Python,

Â maybe you used some kind of debugger or stepper and

Â you kind of tried to understand the running time of your code in terms of

Â how many statements had to be evaluated to arrive at

Â a particular result and that was a reasonable starting point for

Â understanding how Python works but there

Â is one subtlety that actually exists in Python that you should be aware of.

Â That executing one statement in Python doesn't always take constant time.

Â It can take a lot more time based on

Â the size of the data structure that you're working with.

Â Remember Python is implemented in terms of so much lower level language like

Â C and that C code for one line of Python could be doing lots of stuff.

Â What I want to do is,

Â I want to talk a little bit about the speed

Â of operations for two important data types in Python,

Â lists and dictionaries. So let's do that next.

Â Okay, let's talk about the running time of various list operations in Python.

Â Here, we are in the Time Complexity page,

Â we're looking at lists and here,

Â you have a collection of FI operations that we can

Â run on a list of links so there's in items in the list.

Â The first operation is make a copy of the list.

Â It turns out that the running time is O,

Â then you kind of have to go through

Â the entire list to make that copy, so that makes sense.

Â Kind of the two operations that are most useful in the list,

Â we can get a particular item,

Â we can set a particular item as O(1).

Â The time it takes to either get or update an item in a list,

Â like the tenth item doesn't

Â depend on whether the list is linked to a thousand or linked to a million.

Â It still takes the same amount of time.

Â That's a very nice property and that's why lists are very useful.

Â There's one other operation that you'll be

Â using fairly frequently with lists and it's down here,

Â you want to check if something is in the list.

Â That looks kind of like an innocuous operation.

Â Is something in a list?

Â That can't be too hard to do.

Â Well, it turns out the only way you can check if

Â something is in a list is actually just go through

Â the entire list and check that particular item versus every item in the list.

Â That's O(n), that's linear time.

Â This is kind of the most common cause of kind of mystery when people are writing

Â Python code is they use is x in s where s is a long list,

Â and they think one line of Python must be really fast to implement.

Â But, then their code runs really slow as a result.

Â So, let's look at dictionaries now.

Â We're going to see that dictionaries are going to have

Â slightly different running times for important operations,

Â and particularly checking membership.

Â Okay, to finish off lecture,

Â let's talk a little bit about the running time of

Â various operations for dictionaries in Python.

Â Now, this mentions the important thing about dictionaries is they're

Â implemented using an algorithmic technique called hashing.

Â The important thing about hashing is,

Â if you have a collection of n items,

Â we can check whether a particular item is in

Â that collection in constant time, that's O(1).

Â So, we can get an item in constant time,

Â we can set an item in constant time.

Â Notice that was the same thing that we had for lists,

Â but that although it's not mentioned here,

Â there's one important difference between lists.

Â We can check to see if a particular key is in a dictionary in constant time, O(1).

Â Remember for list, it was O again.

Â So there's going to be times when using a dictionary is going to

Â lead to significant improvement in the efficiency of your code.

Â In fact, that's what we're going to look at next,

Â is we're going to look at the problem of joining two CSV files.

Â And, we're going to motivate using dictionaries over lists in this particular case.

Â