What is a plain English explanation of Big O? With as little formal definition as possible and simple mathematics.

Big O is just a way to "Express" yourself in a common way, "How much time / space does it take to run my code?".

You may often see O(n), O(n^2), O(nlogn) and so forth, all these are just ways to show; How does an algorithm change?

O(n) means Big O is n, and now you might think, "What is n!?" Well "n" is the amount of elements. Imaging you want to search for an Item in an Array. You would have to look on Each element and as "Are you the correct element/item?" in the worst case, the item is at the last index, which means that it took as much time as there are items in the list, so to be generic, we say "oh hey, n is a fair given amount of values!".

So then you might understand what "n^2" means, but to be even more specific, play with the thought you have a simple, the simpliest of the sorting algorithms; bubblesort. This algorithm needs to look through the whole list, for each item.

My list

- 1
- 6
- 3

The flow here would be:

- Compare 1 and 6, which is biggest? Ok 6 is in the right position, moving forward!
- Compare 6 and 3, oh, 3 is less! Let's move that, Ok the list changed, we need to start from the begining now!

This is O n^2 because, you need to look at all items in the list there are "n" items. For each item, you look at all items once more, for comparing, this is also "n", so for every item, you look "n" times meaning n*n = n^2

I hope this is as simple as you want it.

But remember, Big O is just a way to experss yourself in the manner of time and space.

EDIT: Quick note, this is almost certainly confusing Big O notation (which is an upper bound) with Theta notation (which is both an upper and lower bound). In my experience this is actually typical of discussions in non-academic settings. Apologies for any confusion caused.

In one sentence: As the size of your job goes up, how much longer does it take to complete it?

Obviously that's only using "size" as the input and "time taken" as the output — the same idea applies if you want to talk about memory usage etc.

Here's an example where we have N T-shirts which we want to dry. We'll *assume* it's incredibly quick to get them in the drying position (i.e. the human interaction is negligible). That's not the case in real life, of course...

Using a washing line outside: assuming you have an infinitely large back yard, washing dries in O(1) time. However much you have of it, it'll get the same sun and fresh air, so the size doesn't affect the drying time.

Using a tumble dryer: you put 10 shirts in each load, and then they're done an hour later. (Ignore the actual numbers here — they're irrelevant.) So drying 50 shirts takes

*about*5 times as long as drying 10 shirts.Putting everything in an airing cupboard: If we put everything in one big pile and just let general warmth do it, it will take a long time for the middle shirts to get dry. I wouldn't like to guess at the detail, but I suspect this is at least O(N^2) — as you increase the wash load, the drying time increases faster.

One important aspect of "big O" notation is that it *doesn't* say which algorithm will be faster for a given size. Take a hashtable (string key, integer value) vs an array of pairs (string, integer). Is it faster to find a key in the hashtable or an element in the array, based on a string? (i.e. for the array, "find the first element where the string part matches the given key.") Hashtables are generally amortised (~= "on average") O(1) — once they're set up, it should take about the same time to find an entry in a 100 entry table as in a 1,000,000 entry table. Finding an element in an array (based on content rather than index) is linear, i.e. O(N) — on average, you're going to have to look at half the entries.

Does this make a hashtable faster than an array for lookups? Not necessarily. If you've got a very small collection of entries, an array may well be faster — you may be able to check all the strings in the time that it takes to just calculate the hashcode of the one you're looking at. As the data set grows larger, however, the hashtable will eventually beat the array.

Quick note, this is almost certainly confusing Big O notation (which is an upper bound) with Theta notation (which is a two-side bound). In my experience this is actually typical of discussions in non-academic settings. Apologies for any confusion caused.

The simplest definition I can give for Big-O notation is this:

**Big-O notation is a relative representation of the complexity of an algorithm.**

There are some important and deliberately chosen words in that sentence:

relative:you can only compare apples to apples. You can't compare an algorithm to do arithmetic multiplication to an algorithm that sorts a list of integers. But a comparison of two algorithms to do arithmetic operations (one multiplication, one addition) will tell you something meaningful;representation:Big-O (in its simplest form) reduces the comparison between algorithms to a single variable. That variable is chosen based on observations or assumptions. For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering). This assumes that comparison is expensive. But what if comparison is cheap but swapping is expensive? It changes the comparison; andcomplexity:if it takes me one second to sort 10,000 elements how long will it take me to sort one million? Complexity in this instance is a relative measure to something else.

Come back and reread the above when you've read the rest.

The best example of Big-O I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learnt in school were:

- addition;
- subtraction;
- multiplication; and
- division.

Each of these is an operation or a problem. A method of solving these is called an **algorithm**.

Addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.

Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add **two** 10,000 digit numbers we have to do 10,000 additions.

See the pattern? The **complexity** (being the number of operations) is directly proportional to the number of digits *n* in the larger number. We call this **O(n)** or **linear complexity**.

Subtraction is similar (except you may need to borrow instead of carry).

Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.

If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (10^{12}) multiplications and two million adds.

As the algorithm scales with n-*squared*, this is **O(n ^{2})** or

**We only care about the most significant portion of complexity.**

The astute may have realized that we could express the number of operations as: n^{2} + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.0002% of the total operations by that stage).

One can notice that we've assumed the worst case scenario here. While multiplying 6 digit numbers if one of them is 4 digit and the other one is 6 digit, then we only have 24 multiplications. Still we calculate the worst case scenario for that 'n', i.e when both are 6 digit numbers. Hence Big-O notation is about the Worst-case scenario of an algorithm

The next best example I can think of is the telephone book, normally called the White Pages or similar but it'll vary from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.

Now if you were instructing a computer to look up the phone number for "John Smith" in a telephone book that contains 1,000,000 names, what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?

A typical implementation might be to open up to the middle, take the 500,000^{th} and compare it to "Smith". If it happens to be "Smith, John", we just got real lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.

This is called a **binary search** and is used every day in programming whether you realize it or not.

So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 20 times. In comparing search algorithms we decide that this comparison is our 'n'.

- For a phone book of 3 names it takes 2 comparisons (at most).
- For 7 it takes at most 3.
- For 15 it takes 4.
- …
- For 1,000,000 it takes 20.

That is staggeringly good isn't it?

In Big-O terms this is **O(log n)** or **logarithmic complexity**. Now the logarithm in question could be ln (base e), log_{10}, log_{2} or some other base. It doesn't matter it's still O(log n) just like O(2n^{2}) and O(100n^{2}) are still both O(n^{2}).

It's worthwhile at this point to explain that Big O can be used to determine three cases with an algorithm:

Best Case:In the telephone book search, the best case is that we find the name in one comparison. This isO(1)orconstant complexity;Expected Case:As discussed above this is O(log n); andWorst Case:This is also O(log n).

Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.

Back to the telephone book.

What if you have a phone number and want to find a name? The police have a reverse phone book but such look-ups are denied to the general public. Or are they? Technically you can reverse look-up a number in an ordinary phone book. How?

You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is **unordered** (by phone number anyway).

So to find a name:

Best Case:O(1);Expected Case:O(n) (for 500,000); andWorst Case:O(n) (for 1,000,000).

This is quite a famous problem in computer science and deserves a mention. In this problem you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Travelling Salesman problem is to find the shortest tour that visits every town.

Sounds simple? Think again.

If you have 3 towns A, B and C with roads between all pairs then you could go:

- A → B → C
- A → C → B
- B → C → A
- B → A → C
- C → A → B
- C → B → A

Well actually there's less than that because some of these are equivalent (A → B → C and C → B → A are equivalent, for example, because they use the same roads, just in reverse).

In actuality there are 3 possibilities.

- Take this to 4 towns and you have (iirc) 12 possibilities.
- With 5 it's 60.
- 6 becomes 360.

This is a function of a mathematical operation called a **factorial**. Basically:

- 5! = 5 × 4 × 3 × 2 × 1 = 120
- 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
- 7! = 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5040
- …
- 25! = 25 × 24 × … × 2 × 1 = 15,511,210,043,330,985,984,000,000
- …
- 50! = 50 × 49 × … × 2 × 1 = 3.04140932 × 10
^{64}

So the Big-O of the Travelling Salesman problem is **O(n!)** or **factorial or combinatorial complexity**.

**By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.**

Something to think about.

Another point I wanted to make quick mention of is that any algorithm that has a complexity of **O(n ^{a})** is said to have

O(n), O(n^{2}) etc are all polynomial time. Some problems cannot be solved in polynomial time. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.

Anyway, that's it for my (hopefully plain English) explanation of Big O (revised).

Big O is a measure of how much time/space an algorithm uses relative to the size of its input.

If an algorithm is O(n) then the time/space will increase at the same rate as its input.

If an algorithm is O(n^2) then the time/space increase at the rate of its input squared.

and so on.

Big O describes an upper limit on the growth behaviour of a function, for example the runtime of a program, when inputs become large.

Examples:

O(n): If I double the input size the runtime doubles

O(n

^{2}): If the input size doubles the runtime quadruplesO(log n): If the input size doubles the runtime increases by one

O(2

^{n}): If the input size increases by one, the runtime doubles

The input size is usually the space in bits needed to represent the input.

It shows how an algorithm scales.

**O(n ^{2}):**

- 1 item: 1 second
- 10 items: 100 seconds
- 100 items: 10000 seconds

Notice that the number of items increases by a factor of 10, but the time increases by a factor of 10^{2}. Basically, n=10 and so O(n^{2}) gives us the scaling factor n^{2} which is 10^{2}.

**O(n):**

- 1 item: 1 second
- 10 items: 10 seconds
- 100 items: 100 seconds

This time the number of items increases by a factor of 10, and so does the time. n=10 and so O(n)'s scaling factor is 10.

**O(1):**

- 1 item: 1 second
- 10 items: 1 second
- 100 items: 1 second

The number of items is still increasing by a factor of 10, but the scaling factor of O(1) is always 1.

That's the gist of it. They reduce the maths down so it might not be exactly n^{2} or whatever they say it is, but that'll be the dominating factor in the scaling.

**Big O describes the fundamental scaling nature of an algorithm.**

There is a lot of information that Big O does not tell you about a given algorithm. It cuts to the bone and gives only information about the scaling nature of an algorithm, specifically how the resource use (think time or memory) of an algorithm scales in response to the "input size".

Consider the difference between a steam engine and a rocket. They are not merely different varieties of the same thing (as, say, a Prius engine vs. a Lamborghini engine) but they are dramatically different kinds of propulsion systems, at their core. A steam engine may be faster than a toy rocket, but no steam piston engine will be able to achieve the speeds of an orbital launch vehicle. This is because these systems have different scaling characteristics with regards to the relation of fuel required ("resource usage") to reach a given speed ("input size").

Why is this so important? Because software deals with problems that may differ in size by factors up to a trillion. Consider that for a moment. The ratio between the speed necessary to travel to the Moon and human walking speed is less than 10,000:1, and that is absolutely tiny compared to the range in input sizes software may face. And because software may face an astronomical range in input sizes there is the potential for the Big O complexity of an algorithm, it's fundamental scaling nature, to trump any implementation details.

Consider the canonical sorting example. Bubble-sort is O(n^2) while merge-sort is O(n log n). Let's say you have two sorting applications, application A which uses bubble-sort and application B which uses merge-sort, and let's say that for input sizes of around 30 elements application A is 1,000x faster than application B at sorting. If you never have to sort much more than 30 elements then it's obvious that you should prefer application A, as it is much faster at these input sizes. However, if you find that you may have to sort ten million items then what you'd expect is that application B actually ends up being thousands of times faster than application A in this case, entirely due to the way each algorithm scales.

Big O notation is a way of describing the upper bound of an algorithm in terms of space or running time. The n is the number of elements in the the problem (i.e size of an array, number of nodes in a tree, etc.) We are interested in describing the running time as n gets big.

When we say some algorithm is O(f(n)) we are saying that the running time (or space required) by that algorithm is always lower than some constant times f(n).

To say that binary search has a running time of O(logn) is to say that there exists some constant c which you can multiply log(n) by that will always be larger than the running time of binary search. In this case you will always have some constant factor of log(n) comparisons.

In other words where g(n) is the running time of your algorithm, we say that g(n) = O(f(n)) when g(n) <= c*f(n) when n > k, where c and k are some constants.

Big-O notation (also called "asymptotic growth" notation) is *what functions "look like" when you ignore constant factors and stuff near the origin*. We use it to talk about **how thing scale**.

**Basics**

**for "sufficiently" large inputs...**

`f(x) ∈ O(upperbound)`

means`f`

"grows no faster than"`upperbound`

`f(x) ∈ Ɵ(justlikethis)`

mean`f`

"grows exactly like"`justlikethis`

`f(x) ∈ Ω(lowerbound)`

means`f`

"grows no slower than"`lowerbound`

big-O notation doesn't care about constant factors: the function `9x²`

is said to "grow exactly like" `10x²`

. Neither does big-O *asymptotic* notation care about *non-asymptotic* stuff ("stuff near the origin" or "what happens when the problem size is small"): the function `10x²`

is said to "grow exactly like" `10x² - x + 2`

.

Why would you want to ignore the smaller parts of the equation? Because they become completely dwarfed by the big parts of the equation as you consider larger and larger scales; their contribution becomes dwarfed and irrelevant. (See example section.)

Put another way, it's all about the **ratio**. *If you divide the actual time it takes by the O(...), you will get a constant factor in the limit of large inputs.* Intuitively this makes sense: functions "scale like" one another if you can multiply one to get the other. That is, when we say...

```
actualAlgorithmTime(N) ∈ O(bound(N))
e.g. "time to mergesort N elements
is O(N log(N))"
```

... this means that ** for "large enough" problem sizes N** (if we ignore stuff near the origin), there exists some constant (e.g. 2.5, completely made up) such that:

```
actualAlgorithmTime(N) e.g. "mergesort_duration(N) "
────────────────────── < constant ───────────────────── < 2.5
bound(N) N log(N)
```

There are many choices of constant; often the "best" choice is known as the "constant factor" of the algorithm... but we often ignore it like we ignore non-largest terms (see Constant Factors section for why they don't usually matter). You can also think of the above equation as a bound, saying "*In the worst-case scenario, the time it takes will never be worse than roughly N*log(N), within a factor of 2.5 (a constant factor we don't care much about)*".

In general, `O(...)`

is the most useful one because we often care about worst-case behavior. If `f(x)`

represents something "bad" like processor or memory usage, then "`f(x) ∈ O(upperbound)`

" means "`upperbound`

is the worse-case scenario of processor/memory usage".

**Intuition**

This lets us make statements like...

```
"For large enough inputsize=N, and a constant
factor of 1, if I double the input size...
... I double the time it takes." ( O(N) )
... I quadruple the time it takes." ( O(N²) )
... I add 1 to the time it takes." ( O(log(N)) )
... I don't change the time it takes." ( O(1) )
```

(with credit to http://stackoverflow.com/a/487292/711085 )

**Applications**

As a purely mathematical construct, big-O notation is not limited to talking about processing time and memory. You can use it to discuss the asymptotics of anything where scaling is meaningful, such as:

- the number of possibly handshakes among
`N`

people at a party (`Ɵ(N²)`

, specifically`N(N-1)/2`

, but what matters is that it "scales like"`N²`

) - probabilistic expected number of people who have seen some viral marketing as a function of time
- how website latency scales with the number of processing units in a CPU or GPU or computer cluster
- how heat output scales on CPU dies as a function of transistor count, voltage, etc.

**Example**

For the handshake example, `#handshakes ∈ Ɵ(N²)`

. The number of handshakes is exactly n-choose-2 or `(N²-N)/2`

(each of N people shakes the hands of N-1 other people, but this double-counts handshakes so divide by 2). However, for very large numbers of people, the linear term `N`

is dwarfed and effectively contributes 0 to the ratio. Therefore the scaling behavior is `order N²`

, or the number of handshakes "grows like N²".

```
#handshakes(N)
────────────── ≈ 1/2
N²
```

If you wanted to prove this to yourself, you could perform some simple algebra on the ratio to split it up into multiple terms (`lim`

means "considered in the limit of", you can ignore it if it makes you feel better):

```
N²/2 - N/2 (N²)/2 N/2 1/2
lim ────────── = lim ( ────── - ─── ) = lim ─── = 1/2
N→∞ N² N→∞ N² N² N→∞ 1
┕━━━┙
this is 0 in the limit of N→∞:
graph it, or plug in a really large number for N
```

**Constant factors**

Usually we don't care what the specific constant factors are, because they don't affect the way the function grows. For example, two algorithm may both take `O(N)`

time to complete, but one may be twice as slow as the other. We usually don't care too much unless the factor is very large, since optimizing is tricky business ( When is optimisation premature? ); also the mere act of picking an algorithm with a better big-O will often improve performance by orders of magnitude.

Some asymptotically superior algorithms (e.g. a non-comparison `O(N log(log(N)))`

sort) can have so large a constant factor (e.g. `100000*N log(log(N))`

), or overhead that is relatively large like `O(N log(log(N)))`

with a hidden `+ 100*N`

, that they are rarely worth using even on "big data".

**Why O(N) is sometimes the best you can do, i.e. why we need datastructures**

`O(N)`

algorithms are in some sense the "best" algorithms if you need to read all your data. The **very act of reading** a bunch of data is an `O(N)`

operation. Loading it into memory is usually `O(N)`

(or faster if you have hardware support, or no time at all if you've already read the data). However if you touch or even *look* at every piece of data (or even every other piece of data), your algorithm will take `O(N)`

time to perform this looking. Nomatter how long your actual algorithm takes, it will be at least `O(N)`

because it spent that time looking at all the data.

The same can be said for the **very act of writing**. For example, all algorithms which print out all permutations of a number N are `O(N!)`

because the output is at least that long.

This motivates the use of **data structures**: a data structure requires reading the data only once (usually `O(N)`

time), plus some arbitrary amount of preprocessing (e.g. `O(N)`

or `O(N log(N))`

or `O(N²)`

) which we try to keep small. Thereafter, modifying the data structure (insertions / deletions / etc.) and making queries on the data take very little time, such as `O(1)`

or `O(log(N))`

. You then proceed to make a large number of queries! In general, the more work you're willing to do ahead of time, the less work you'll have to do later on.

For example, say you had the latitude and longitude coordinates of millions of roads segments, and wanted to find all street intersections.

- Naive method: If you had the coordinates of a street intersection, and wanted to examine nearby streets, you would have to go through the millions of segments each time, and check each one for adjacency.
- If you only needed to do this once, it would not be a problem to have to do the naive method of
`O(N)`

work only once, but if you want to do it many times (in this case,`N`

times, once for each segment), we'd have to do`O(N²)`

work, or 1000000²=1000000000000 operations. Not good (a modern computer can perform about a billion operations per second). - If we use a simple structure called a hash table (an instant-speed lookup table, also known as a hashmap or dictionary), we pay a small cost by preprocessing everything in
`O(N)`

time. Thereafter, it only takes constant time on average to look up something by its key (in this case, our key is the latitude and longitude coordinates, rounded into a grid; we search the adjacent gridspaces of which there are only 9, which is a constant). - Our task went from an infeasible
`O(N²)`

to a manageable`O(N)`

, and all we had to do was pay a minor cost to make a hash table.

The moral of the story: a data structure lets us speed up operations. Even more advanced data structures can let you combine, delay, or even ignore operations in incredibly clever ways, like leaving the equivalent of "to-do" notes at junctions in a tree.

**Amortized / average-case complexity**

There is also the concept of "amortized" or "average case". This is no more than using big-O notation for the expected value of a function, rather than the function itself. For example, some data structures may have a worse-case complexity of `O(N)`

for a single operation, but guarantee that if you do many of these operations, the average-case complexity will be `O(1)`

.

**Multidimensional big-O**

Most of the time, people don't realize that there's more than one variable at work. For example, in a string-search algorithm, your algorithm may take time `O([length of text] + [length of query])`

, i.e. it is linear in two variables like `O(N+M)`

. Other more naive algorithms may be `O([length of text]*[length of query])`

or `O(N*M)`

. Ignoring multiple variables is one of the most common oversights I see in algorithm analysis, and can handicap you when designing an algorithm.

**The whole story**

Keep in mind that big-O is not the whole story. You can drastically speed up some algorithms by using caching, making them cache-oblivious, avoiding bottlenecks by working with RAM instead of disk, using parallelization, or doing work ahead of time -- these techniques are often *independent* of the order-of-growth "big-O" notation, though you will often see the number of cores in the big-O notation of parallel algorithms.

Also keep in mind that due to hidden constraints of your program, you might not really care about asymptotic behavior. You may be working with a bounded number of values, for example:

- If you're sorting something like 5 elements, you don't want to use the speedy
`O(N log(N))`

quicksort; you want to use insertion sort, which happens to perform well on small inputs. These situations often comes up in divide-and-conquer algorithms, where you split up the problem into smaller and smaller subproblems, such as recursive sorting, fast Fourier transforms, or matrix multiplication. - If some values are effectively bounded due to some hidden fact (e.g. the average human name is softly bounded at perhaps 40 letters, and human age is softly bounded at around 150). You can also impose bounds on your input to effectively make terms constant.

In practice, even among algorithms which have the same or similar asymptotic performance, their relative merit may actually be driven by other things, such as: other performance factors (quicksort and mergesort are both `O(N log(N))`

, but quicksort takes advantage of CPU caches); non-performance considerations, like ease of implementation; whether a library is available, and how reputable and maintained the library is.

Many things can *implicitly* contribute to the running time's *constant factor*, such as whether you run your algorithm on a 500MHz computer vs 2GHz computer, whether your programming language is interpreted or using a JIT compiler, whether you are doing a constant amount of extra work in a critical section of code, etc. The effect may be small (e.g. 0.9x speed) or large (e.g. 0.01x speed) compared to a different implementation and/or environment. Do you switch languages to eek out that little extra constant factor of work? That literally depends on a hundred other reasons (necessity, skills, coworkers, programmer productivity, the monetary value of your time, familiarity, workarounds, why not assembly or GPU, etc...), which may be more important than performance.

The above issues, like programming language, are almost never considered as part of the constant factor (nor should they be); yet one should be aware of them, because *sometimes* (though rarely) they may not be constant. For example in cpython, the native priority queue implementation is asymptotically non-optimal (`O(log(N))`

rather than `O(1)`

for your choice of insertion or find-min); do you use another implementation? Probably not, since the C implementation is probably faster, and there are probably other similar issues elsewhere. There are tradeoffs; sometimes they matter and sometimes they don't.

*Math addenda*

*For completeness, the precise definition of big-O notation is as follows: f(x) ∈ O(g(x)) means that "f is asymptotically upper-bounded by const*g": ignoring everything below some finite value of x, there exists a constant such that |f(x)| ≤ const * |g(x)|. (The other symbols are as follows: just like O means ≤, Ω means ≥. There are lowercase variants: o means <, and ω means >.) f(x) ∈ Ɵ(g(x)) means both f(x) ∈ O(g(x)) and f(x) ∈ Ω(g(x)) (upper- and lower-bounded by g): there exists some constants such that f will always lie in the "band" between const1*g(x) and const2*g(x). It is the strongest asymptotic statement you can make and roughly equivalent to ==. (Sorry, I elected to delay the mention of the absolute-value symbols until now, for clarity's sake; especially because I have never seen negative values come up in a computer science context.)*

*People will often use = O(...). It is technically more correct to use ∈ O(...). ∈ means "is an element of". O(N²) is actually an *

Ok, my 2cents.

Big-O, is **rate of increase** of resource consumed by program, w.r.t. problem-instance-size

Resource : Could be total-CPU time, could be maximum RAM space. By default refers to CPU time.

Say the problem is "Find the sum",

```
int Sum(int*arr,int size){
int sum=0;
while(size-->0)
sum+=arr[size];
return sum;
}
```

problem-instance= {5,10,15} ==> problem-instance-size = 3, iterations-in-loop= 3

problem-instance= {5,10,15,20,25} ==> problem-instance-size = 5 iterations-in-loop = 5

For input of size "n" the program is growing at speed of "n" iterations in array. Hence Big-O is N expressed as O(n)

Say the problem is "Find the Combination",

```
void Combination(int*arr,int size)
{ int outer=size,inner=size;
while(outer -->0) {
inner=size;
while(inner -->0)
cout<<arr[outer]<<"-"<<arr[inner]<<endl;
}
}
```

problem-instance= {5,10,15} ==> problem-instance-size = 3, total-iterations = 3*3 = 9

problem-instance= {5,10,15,20,25} ==> problem-instance-size = 5, total-iterations= 5*5 =25

For input of size "n" the program is growing at speed of "n*n" iterations in array. Hence Big-O is N^2 expressed as O(n^2)

Big O notation is most commonly used by programmers as an approximate measure of how long a computation (algorithm) will take to complete expressed as a function of the size of the input set.

Big O is useful to compare how well two algorithms will scale up as the number of inputs is increased.

More precisely Big O notation is used to express the asymptotic behavior of a function. That means how the function behaves as it approaches infinity.

In many cases the "O" of an algorithm will fall into one of the following cases:

**O(1)**- Time to complete is the same regardless of the size of input set. An example is accessing an array element by index.**O(Log N)**- Time to complete increases roughly in line with the log2(n). For example 1024 items takes roughly twice as long as 32 items, because Log2(1024) = 10 and Log2(32) = 5. An example is finding an item in a binary search tree (BST).**O(N)**- Time to complete that scales linearly with the size of the input set. In other words if you double the number of items in the input set, the algorithm takes roughly twice as long. An example is counting the number of items in a linked list.**O(N Log N)**- Time to complete increases by the number of items times the result of Log2(N). An example of this is heap sort and quick sort.**O(N^2)**- Time to complete is roughly equal to the square of the number of items. An example of this is bubble sort.**O(N!)**- Time to complete is the factorial of the input set. An example of this is the traveling salesman problem brute-force solution.

Big O ignores factors that do not contribute in a meaningful way to the growth curve of a function as the input size increases towards infinity. This means that constants that are added to or multiplied by the function are simply ignored.

Not sure I'm further contributing to the subject but still thought I'd share: I once found this blog post to have some quite helpful (though very basic) explanations & examples on Big O:

Via examples, this helped get the bare basics into my tortoiseshell-like skull, so I think it's a pretty descent 10-minute read to get you headed in the right direction.

What is a plain English explanation of Big O? With as little formal definition as possible and simple mathematics.

**A Plain English Explanation of the Need for Big-O Notation:**

When we program, we are trying to solve a problem. What we code is called an algorithm. Big O notation allows us to compare the worse case performance of our algorithms in a standardized way. Hardware specs vary over time and improvements in hardware can reduce the time it takes an algorithms to run. But replacing the hardware does not mean our algorithm is any better or improved over time, as our algorithm is still the same. So in order to allow us to compare different algorithms, to determine if one is better or not, we use Big O notation.

**A Plain English Explanation of What Big O Notation is:**

Not all algorithms run in the same amount of time, and can vary based on the number of items in the input, which we'll call *n*. Based on this, we consider the worse case analysis, or an upper-bound of the run-time as *n* get larger and larger. We must be aware of what *n* is, because many of the Big O notations reference it.

**Big O**

*f*(x) = O(*g*(x)) when x goes to a (for example, a = +∞) means that there is a function *k* such that:

*f*(x) =*k*(x)*g*(x)k is bounded in some neighborhood of a (if a = +∞, this means that there are numbers N and M such that for every x > N, |

*k*(x)| < M).

In other words, in plain English: *f*(x) = O(*g*(x)), x → a, means that in a neighborhood of a, *f* decomposes into the product of *g* and some bounded function.

**Small o**

By the way, here is for comparison the definition of small o.

*f*(x) = o(*g*(x)) when x goes to a means that there is a function k such that:

*f*(x) =*k*(x)*g*(x)*k*(x) goes to 0 when x goes to a.

*Examples*

sin x = O(x) when x → 0.

sin x = O(1) when x → +∞,

x^2 + x = O(x) when x → 0,

x^2 + x = O(x^2) when x → +∞,

ln(x) = o(x) = O(x) when x → +∞.

**Attention!** The notation with the equal sign "=" uses a "fake equality": it is true that o(g(x)) = O(g(x)), but false that O(g(x)) = o(g(x)). Similarly, it is ok to write "ln(x) = o(x) when x → +∞", but the formula "o(x) = ln(x)" would make no sense.

*More examples*

O(1) = O(n) = O(n^2) when n → +∞ (but not the other way around, the equality is "fake"),

O(n) + O(n^2) = O(n^2) when n → +∞

O(O(n^2)) = O(n^2) when n → +∞

O(n^2)O(n^3) = O(n^5) when n → +∞

Here is the Wikipedia article: https://en.wikipedia.org/wiki/Big_O_notation

**Algorithm example (Java):**

```
public boolean simple_search (ArrayList<Integer> list, int key)
{
for (Integer i : list)
{
if (i == key)
{
return true;
}
}
return false;
}
```

**Algorithm description:**

This algorithm search a list, item by item, looking for a key,

Iterating on each item in the list, if it's the key then return True,

If the loop has finished without finding the key, return False.

*Big-O notation represent the upper-bound on the Complexity (Time, Space, ..)*

**To find The Big-O on Time Complexity:**

Calculate how much time (regarding input size) the worst case takes:

Worst-Case: the key doesn't exist in the list.

Time(Worst-Case) = 4n+1

Time: O(4n+1) = O(n) | in Big-O, constants are neglected

O(n) ~ Linear

**There's also Big-Omega, which represent complexity of the Best-Case:**

Best-Case: the key is the first item.

Time(Best-Case) = 4

Time: Ω(4) = O(1) ~ Instant\Constant

It is very difficult to measure the speed of software programs, and when we try, the answers can be very complex and filled with exceptions and special cases. This is a big problem, because all those exceptions and special cases are distracting and unhelpful when we want to compare two different programs with one another to find out which is "fastest".

As a result of all this unhelpful complexity, people try to describe the speed of software programs using the smallest and least complex (mathematical) expressions possible. These expressions are very very crude approximations: Although, with a bit of luck, they will capture the "essence" of whether a piece of software is fast or slow.

Because they are approximations, we use the letter "O" (Big Oh) in the expression, as a convention to signal to the reader that we are making a gross oversimplification. (And to make sure that nobody mistakenly thinks that the expression is in any way accurate).

If you read the "Oh" as meaning "on the order of" or "approximately" you will not go too far wrong. (I think the choice of the Big-Oh might have been an attempt at humour).

The only thing that these "Big-Oh" expressions try to do is to describe how much the software slows down as we increase the amount of data that the software has to process. If we double the amount of data that needs to be processed, does the software need twice as long to finish it's work? Ten times as long? In practice, there are a very limited number of big-Oh expressions that you will encounter and need to worry about:

The good:

`O(1)`

**Constant**: The program takes the same time to run no matter how big the input is.`O(log n)`

**Logarithmic**: The program run-time increases only slowly, even with big increases in the size of the input.

The bad:

`O(n)`

**Linear**: The program run-time increases proportionally to the size of the input.`O(n^k)`

**Polynomial**: - Processing time grows faster and faster - as a polynomial function - as the size of the input increases.

... and the ugly:

`O(k^n)`

**Exponential**The program run-time increases very quickly with even moderate increases in the size of the problem - it is only practical to process small data sets with exponential algorithms.`O(n!)`

**Factorial**The program run-time will be longer than you can afford to wait for anything but the very smallest and most trivial-seeming datasets.

"

What is a plain English explanation of Big O? With as little formal definition as possible and simple mathematics."

Such a beautifully simple and short question seems at least to deserve an equally short answer, like a student might receive during tutoring.

Big O notation simply tells how much time* an algorithm can run within, in terms of

only the amount of input data**.

( *in a wonderful, *unit-free* sense of time!)

(**which is what matters, because people will *always* want more, whether they live today or tomorrow)

Well, what's so wonderful about Big O notation if that's what it does?

Practically speaking, Big O analysis is

*so useful and important*because Big O puts the focus squarely on the algorithm's*own*complexity and completely*ignores*anything that is merely a proportionality constant—like a JavaScript engine, the speed of a CPU, your Internet connection, and all those things which become quickly become as laughably outdated as a Model*T*. Big O focuses on performance only in the way that matters equally as much to people living in the present or in the future.Big O notation also shines a spotlight directly on the most important principle of computer programming/engineering, the fact which inspires all good programmers to keep thinking and dreaming: the only way to achieve results beyond the slow forward march of technology is to

*invent a better algorithm*.

Assume we're talking about an algorithm **A**, which should do something with a dataset of size **n**.

Then `O( <some expression X involving n> )`

means, in simple English:

If you're unlucky when executing A, it might take X(n) operations to complete.

As it happens, there are certain functions (think of them as *implementations* of **X(n)**) that tend to occur quite often. These are well known and easily compared (Examples: `1`

, `Log N`

, `N`

, `N^2`

, `N!`

, etc..)

By comparing these when talking about **A** and other algorithms, it is easy to rank the algorithms according to the number of operations they *may* (worst-case) require to complete.

In general, our goal will be to find or structure an algorithm **A** in such a way that it will have a function `X(n)`

that returns as low a number as possible.

A simple straightforward answer can be:

Big O represents the worst possible time/space for that algorithm. The algorithm will never take more space/time above that limit. Big O represents time/space complexity in the extreme case.

Here is the plain English bestiary I tend to use when explaining the common varieties of Big-O

In all cases, prefer algorithms higher up on the list to those lower on the list. However, the cost of moving to a more expensive complexity class varies significantly.

**O(1):**

No growth. Regardless of how big as the problem is, you can solve it in the same amount of time. This is somewhat analogous to broadcasting where it takes the same amount of energy to broadcast over a given distance, regardless of the number of people that lie within the broadcast range.

**O(log n):**

This complexity is the same as **O(1)** except that it's just a little bit worse. For all practical purposes, you can consider this as a very large constant scaling. The difference in work between processing 1 thousand and 1 billion items is only a factor six.

**O( n):**

The cost of solving the problem is proportional to the size of the problem. If your problem doubles in size, then the cost of the solution doubles. Since most problems have to be scanned into the computer in some way, as data entry, disk reads, or network traffic, this is generally an affordable scaling factor.

**O( n log n):**

This complexity is very similar to **O( n)**. For all practical purposes, the two are equivalent. This level of complexity would generally still be considered scalable. By tweaking assumptions some

**O( n^{2}):**

Grows as a square, where *n* is the length of the side of a square. This is the same growth rate as the "network effect", where everyone in a network might know everyone else in the network. Growth is expensive. Most scalable solutions cannot use algorithms with this level of complexity without doing significant gymnastics. This generally applies to all other polynomial complexities - **O( n^{k})** - as well.

**O(2 ^{n}):**

Does not scale. You have no hope of solving any non-trivially sized problem. Useful for knowing what to avoid, and for experts to find approximate algorithms which are in **O( n^{k})**.

Big O notation is a way of describing how quickly an algorithm will run given an arbitrary number of input parameters, which we'll call "n". It is useful in computer science because different machines operate at different speeds, and simply saying that an algorithm takes 5 seconds doesn't tell you much because while you may be running a system with a 4.5 Ghz octo-core processor, I may be running a 15 year old, 800 Mhz system, which would take longer regardless of the algorithm. So instead of specifying how fast an algorithm runs in terms of time, we say how fast it runs in terms of number of input parameters, or "n". By describing algorithms in this way, we are able to compare the speeds of algorithms without having to take into account the speed of the computer itself.

I've more simpler way to understand the time complexity he most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:

```
statement;
```

Is constant. The running time of the statement will not change in relation to N

```
for ( i = 0; i < N; i++ )
statement;
```

Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.

```
for ( i = 0; i < N; i++ )
{
for ( j = 0; j < N; j++ )
statement;
}
```

Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.

```
while ( low <= high )
{
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
```

Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.

```
void quicksort ( int list[], int left, int right )
{
int pivot = partition ( list, left, right );
quicksort ( list, left, pivot - 1 );
quicksort ( list, pivot + 1, right );
}
```

Is N * log ( N ). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.

In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common. Big O notation is described as O ( ) where is the measure. The quicksort algorithm would be described as O ( N * log ( N ) ).

Note: None of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course.

If you have a suitable notion of infinity in your head, then there is a very brief description:

Big O notation tells you the cost of solving an infinitely large problem.

And furthermore

Constant factors are negligible

If you upgrade to a computer that can run your algorithm twice as fast, big O notation won't notice that. Constant factor improvements are too small to even be noticed in the scale that big O notation works with. Note that this is an intentional part of the design of big O notation.

Although anything "larger" than a constant factor can be detected, however.

When interested in doing computations whose size is "large" enough to be considered as approximately infinity, then big O notation is approximately the cost of solving your problem.

If the above doesn't make sense, then you don't have a compatible intuitive notion of infinity in your head, and you should probably disregard all of the above; the only way I know to make these ideas rigorous, or to explain them if they aren't already intuitively useful, is to first teach you big O notation or something similar. (although, once you well understand big O notation in the future, it may be worthwhile to revisit these ideas)

If I want to explain this to 6 years old child I will start to draw some functions f(x) = x and f(x) = x^2 for example and ask a child which function will be upper function on the top of the page. Then we will proceed with drawing and see that x^2 wins. "Who wins" actually is the function which grows faster when x tends to infinity. So "function x is in Big O of x^2" means that x grows slower than x^2 when x tends to infinity. The same can be done when x tends to 0. If we draw these two function for x from 0 to 1 x will be upper function, so "function x^2 is in Big O of x for x tends to 0". When child will get older I add that really Big O can be a function which grows not faster but the same way as given function. Moreover constant is discarded. So 2x is in Big O of x.

**A simple explanation**

Suppose you want to say that the running time of your algorithm increases linearly with the number of input parameters, you would write it in Big O notation as

`T(n)∈O(n).`

Read as, T of n is in Big O of n.

where

```
n=number of input parameters
T(n)=The actual running time of the algorithm
```

Or suppose if the running time of your algorithm is n². You would write it in Big O notation as

```
T(n)∈O(n²)
```

Read as, T of n is in Big O of n square

**A little more formal explanation**

Suppose that the simplified expression that gives the running time as a function of number of input parameters n, is represented as f(n).

Suppose T(n) is the real function that gives the running time as a function of the number of input parameters.

Then f(n) is considered good enough as long as the below condition is true.

```
lim T(n) ≤ c×f(n)
n→∞
```

In the function above

n=number of input parameters

T(n)= The actual running time for n input parameters (found by actually measuring the running time)

c= a constant

f(n)= The running time found from our Big O notation

The equation is read as As n approaches infinity, T of n, is less than or equal to c times f of n.

In big O notation this is written as

```
T(n)∈O(n)
```

This is read as T of n is in big O of n.

Note: If you say your algorithm is a Big O of n, it means it is a function of n or faster. So that means that if your algorithm is Big O of n, then it is also automatically the Big O of n square.

Big O of n means my algorithm runs at least as fast as this. It just gives the upper bounds.

Check this out for a video tutorial on Big O from UC Berkley.

This is a very simplified explanation, but I hope it covers most important details.

Let's say your algorithm dealing with the problem depends on some 'factors', for example let's make it N and X.

Depending on N and X, your algorithm will require some operations, for example in the WORST case it's `3(N^2) + log(X)`

operations.

Since Big-O doesn't care too much about constant factor (aka 3), the Big-O of your algorithm is `O(N^2 + log(X))`

. It basically translates 'the amount of operations your algorithm needs for the worst case scales with this'.

Say you order Harry Potter: Complete 8-Film Collection [Blu-ray] from Amazon and download the same film collection online at the same time. You want to test which method is faster. The delivery takes almost a day to arrive and the download completed about 30 minutes earlier. Great! So it’s a tight race.

What if I order several Blu-ray movies like The Lord of the Rings, Twilight, The Dark Knight Trilogy, etc. and download all the movies online at the same time? This time, the delivery still take a day to complete, but the online download takes 3 days to finish. For online shopping, the number of purchased item (input) doesn’t affect the delivery time. The output is constant. We call this **O(1)**.

For online downloading, the download time is directly proportional to the movie file sizes (input). We call this **O(n)**.

From the experiments, we know that online shopping scales better than online downloading. It is very important to understand big O notation because it helps you to analyze the **scalability** and **efficiency** of algorithms.

**Note:** Big O notation represents the **worst-case scenario** of an algorithm. Let’s assume that **O(1)** and **O(n)** are the worst-case scenarios of the example above.

**Reference** : http://carlcheo.com/compsci

You want to know all there is to know of big O? So do I.

So to talk of big O, I will use words that have just one beat in them. One sound per word. Small words are quick. You know these words, and so do I. We will use words with one sound. They are small. I am sure you will know all of the words we will use!

Now, let’s you and me talk of work. Most of the time, I do not like work. Do you like work? It may be the case that you do, but I am sure I do not.

I do not like to go to work. I do not like to spend time at work. If I had my way, I would like just to play, and do fun things. Do you feel the same as I do?

Now at times, I do have to go to work. It is sad, but true. So, when I am at work, I have a rule: I try to do less work. As near to no work as I can. Then I go play!

So here is the big news: the big O can help me not to do work! I can play more of the time, if I know big O. Less work, more play! That is what big O helps me do.

Now I have some work. I have this list: one, two, three, four, five, six. I must add all things in this list.

Wow, I hate work. But oh well, I have to do this. So here I go.

One plus two is three… plus three is six... and four is... I don’t know. I got lost. It is too hard for me to do in my head. I don’t much care for this kind of work.

So let's not do the work. Let's you and me just think how hard it is. How much work would I have to do, to add six numbers?

Well, let’s see. I must add one and two, and then add that to three, and then add that to four… All in all, I count six adds. I have to do six adds to solve this.

Here comes big O, to tell us just how hard this math is.

Big O says: we must do six adds to solve this. One add, for each thing from one to six. Six small bits of work... each bit of work is one add.

Well, I will not do the work to add them now. But I know how hard it would be. It would be six adds.

Oh no, now I have more work. Sheesh. Who makes this kind of stuff?!

Now they ask me to add from one to ten! Why would I do that? I did not want to add one to six. To add from one to ten… well… that would be even more hard!

How much more hard would it be? How much more work would I have to do? Do I need more or less steps?

Well, I guess I would have to do ten adds… one for each thing from one to ten. Ten is more than six. I would have to work that much more to add from one to ten, than one to six!

I do not want to add right now. I just want to think on how hard it might be to add that much. And, I hope, to play as soon as I can.

To add from one to six, that is some work. But do you see, to add from one to ten, that is more work?

Big O is your friend and mine. Big O helps us think on how much work we have to do, so we can plan. And, if we are friends with big O, he can help us choose work that is not so hard!

Now we must do new work. Oh, no. I don’t like this work thing at all.

The new work is: add all things from one to n.

Wait! What is n? Did I miss that? How can I add from one to n if you don’t tell me what n is?

Well, I don’t know what n is. I was not told. Were you? No? Oh well. So we can’t do the work. Whew.

But though we will not do the work now, we can guess how hard it would be, if we knew n. We would have to add up n things, right? Of course!

Now here comes big O, and he will tell us how hard this work is. He says: to add all things from one to N, one by one, is O(n). To add all these things, [I know I must add n times.][1] That is big O! He tells us how hard it is to do some type of work.

To me, I think of big O like a big, slow, boss man. He thinks on work, but he does not do it. He might say, "That work is quick." Or, he might say, "That work is so slow and hard!" But he does not do the work. He just looks at the work, and then he tells us how much time it might take.

I care lots for big O. Why? I do not like to work! No one likes to work. That is why we all love big O! He tells us how fast we can work. He helps us think of how hard work is.

Uh oh, more work. Now, let’s not do the work. But, let’s make a plan to do it, step by step.

They gave us a deck of ten cards. They are all mixed up: seven, four, two, six… not straight at all. And now... our job is to sort them.

Ergh. That sounds like a lot of work!

How can we sort this deck? I have a plan.

I will look at each pair of cards, pair by pair, through the deck, from first to last. If the first card in one pair is big and the next card in that pair is small, I swap them. Else, I go to the next pair, and so on and so on... and soon, the deck is done.

When the deck is done, I ask: did I swap cards in that pass? If so, I must do it all once more, from the top.

At some point, at some time, there will be no swaps, and our sort of the deck would be done. So much work!

Well, how much work would that be, to sort the cards with those rules?

I have ten cards. And, most of the time -- that is, if I don’t have lots of luck -- I must go through the whole deck up to ten times, with up to ten card swaps each time through the deck.

Big O, help me!

Big O comes in and says: for a deck of n cards, to sort it this way will be done in O(N squared) time.

Why does he say n squared?

Well, you know n squared is n times n. Now, I get it: n cards checked, up to what might be n times through the deck. That is two loops, each with n steps. That is n squared much work to be done. A lot of work, for sure!

Now when big O says it will take O(n squared) work, he does not mean n squared adds, on the nose. It might be some small bit less, for some case. But in the worst case, it will be near n squared steps of work to sort the deck.

Now here is where big O is our friend.

Big O points out this: as n gets big, when we sort cards, the job gets MUCH MUCH MORE HARD than the old just-add-these-things job. How do we know this?

Well, if n gets real big, we do not care what we might add to n or n squared.

For big n, n squared is more large than n.

Big O tells us that to sort things is more hard than to add things. O(n squared) is more than O(n) for big n. That means: if n gets real big, to sort a mixed deck of n things MUST take more time, than to just add n mixed things.

Big O does not solve the work for us. Big O tells us how hard the work is.

I have a deck of cards. I did sort them. You helped. Thanks.

Is there a more fast way to sort the cards? Can big O help us?

Yes, there is a more fast way! It takes some time to learn, but it works... and it works quite fast. You can try it too, but take your time with each step and do not lose your place.

In this new way to sort a deck, we do not check pairs of cards the way we did a while ago. Here are your new rules to sort this deck:

One: I choose one card in the part of the deck we work on now. You can choose one for me if you like. (The first time we do this, “the part of the deck we work on now” is the whole deck, of course.)

Two: I splay the deck on that card you chose. What is this splay; how do I splay? Well, I go from the start card down, one by one, and I look for a card that is more high than the splay card.

Three: I go from the end card up, and I look for a card that is more low than the splay card.

Once I have found these two cards, I swap them, and go on to look for more cards to swap. That is, I go back to step Two, and splay on the card you chose some more.

At some point, this loop (from Two to Three) will end. It ends when both halves of this search meet at the splay card. Then, we have just splayed the deck with the card you chose in step One. Now, all the cards near the start are more low than the splay card; and the cards near the end are more high than the splay card. Cool trick!

Four (and this is the fun part): I have two small decks now, one more low than the splay card, and one more high. Now I go to step one, on each small deck! That is to say, I start from step One on the first small deck, and when that work is done, I start from step One on the next small deck.

I break up the deck in parts, and sort each part, more small and more small, and at some time I have no more work to do. Now this may seem slow, with all the rules. But trust me, it is not slow at all. It is much less work than the first way to sort things!

What is this sort called? It is called Quick Sort! That sort was made by a man called C. A. R. Hoare and he called it Quick Sort. Now, Quick Sort gets used all the time!

Quick Sort breaks up big decks in small ones. That is to say, it breaks up big tasks in small ones.

Hmmm. There may be a rule in there, I think. To make big tasks small, break them up.

This sort is quite quick. How quick? Big O tells us: this sort needs O(n log n) work to be done, in the mean case.

Is it more or less fast than the first sort? Big O, please help!

The first sort was O(n squared). But Quick Sort is O(n log n). You know that n log n is less than n squared, for big n, right? Well, that is how we know that Quick Sort is fast!

If you have to sort a deck, what is the best way? Well, you can do what you want, but I would choose Quick Sort.

Why do I choose Quick Sort? I do not like to work, of course! I want work done as soon as I can get it done.

How do I know Quick Sort is less work? I know that O(n log n) is less than O(n squared). The O's are more small, so Quick Sort is less work!

Now you know my friend, Big O. He helps us do less work. And if you know big O, you can do less work too!

You learned all that with me! You are so smart! Thank you so much!

Now that work is done, let’s go play!

[1]: There is a way to cheat and add all the things from one to n, all at one time. Some kid named Gauss found this out when he was eight. I am not that smart though, so don't ask me how he did it.

Big O in plain english is like <= (less than or equal). When we say for two functions f and g, f = O(g) it means that f <= g.

However, this does not mean that for any n f(n) <= g(n). Actually what it means is that **f is less than or equal g in terms of growth**. It means that **after a point** f(n) <= c*g(n) if **c is a constant**. And **after a point** means than for all n >= n0 where n0 **is another constant**.

Big O is a means to represent the upper bounds of any function. We generally use it for expressing the upper bounds of a function that tells the running time of an Algorithm.

*Ex : f(n) = 2(n^2) +3n be a function representing the running time of a hypothetical algorithm, Big-O notation essentially gives the upper limit for this function which is O(n^2)*

This notation basically tells us that, for any input 'n' the running time won't be greater than the value expressed by Big-O notation.

Also, agree with all the above detailed answers. Hope this helps !!

Big O is describing a class of functions.

It describes how fast functions grow for big input values.

For a given function f, O(f) descibes all functions g(n) for which you can find an n0 and a constant c so that all values of g(n) with n >= n0 are less or equal to c*f(n)

In less mathematical words O(f) is a set of functions. Namely all functions, that from some value n0 onwards, are growing slower or as fast as f.

If f(n) = n then

g(n) = 3n is in O(f).Because constant factors do not matter h(n) = n+1000 is in O(f) because it might be bigger for all values smaler than 1000 but for big O only huge inputs matter.

However i(n) = n^2 is not in O(f) because a quadratic funcion grows faster than a linear one.

I found a really great explanation about big o notation especially for a someone who's not much into mathematics.

https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.

Anyone who's read Programming Pearls or any other Computer Science books and doesn’t have a grounding in Mathematics will have hit a wall when they reached chapters that mention O(N log N) or other seemingly crazy syntax. Hopefully this article will help you gain an understanding of the basics of Big O and Logarithms.

As a programmer first and a mathematician second (or maybe third or fourth) I found the best way to understand Big O thoroughly was to produce some examples in code. So, below are some common orders of growth along with descriptions and examples where possible. O(1)

O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.

```
> bool IsFirstElementNull(IList<string> elements) {
> return elements[0] == null; } O(N)
```

O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

```
bool ContainsValue(IList<string> elements, string value) {
> foreach (var element in elements)
> {
> if (element == value) return true;
> }
>
> return false; }
```

O(N

^{2})O(N

^{2}) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N^{3}), O(N^{4}) etc.

```
bool ContainsDuplicates(IList<string> elements) {
> for (var outer = 0; outer < elements.Count; outer++)
> {
> for (var inner = 0; inner < elements.Count; inner++)
> {
> // Don't compare with self
> if (outer == inner) continue;
>
> if (elements[outer] == elements[inner]) return true;
> }
> }
>
> return false; }
```

O(2^{N})

O(2

^{N}) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2^{N}) function is exponential - starting off very shallow, then rising meteorically. An example of an O(2^{N}) function is the recursive calculation of Fibonacci numbers:

```
int Fibonacci(int number) {
> if (number <= 1) return number;
>
> return Fibonacci(number - 2) + Fibonacci(number - 1); }
```

Logarithms

Logarithms are slightly trickier to explain so I'll use a common example:

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.

**algorithm**: procedure/formula for solving a problem

**How do analyze algorithms and how can we compare algorithms against each other?**

*example:* you and a friend are asked to create a function to sum the numbers from 0 to N. You come up with f(x) and your friend comes up with g(x). Both functions have the same result, but a different algorithm. In order to objectively compare the efficiency of the algorithms we use **Big-O notation**.

**Big-O notation:** describes *how quickly runtime will grow relative to the input as the input get arbitrarily large.*

**3 key takeaways:**

- Compare
*how quickly runtime grows***NOT***compare exact runtimes*(depends on hardware) - Only concerned with runtime grow relative to the input
**(n)** - As
*n*gets arbitrarily large, focus on the terms that will grow the fastest as n gets large (think infinity) AKA**asymptotic analysis**

**Space complexity:** aside from time complexity, we also care about space complexity (how much memory/space an algorithm uses). Instead of checking the time of operations, we check the size of the allocation of memory.

Similar Questions

This question already has an answer here: What does O(log n) mean exactly? 25 answers Like the Big O notation O(1) can describe following code: O(1): for (int i = 0; i < 10; i++) { // do

So I've been trying to understand Big O notation as well as I can, but there are still some things I'm confused about. So I keep reading that if something is O(n), it usually is referring to the worst

I think this is probably a beginner question about big-O notation. Say, for example, I have an algorithm that breaks apart an entire list recursively(O(n)) and then puts it back together (O(n)). I ass

Please excuse the intentional verbosity Here is a small program excerpt: for i=1 to n j= 0; while(j<=n); j=j+1; If I have to find the complexity(Big O) of this code: I'll first count how many time

This may be a stupid question, but here goes. I've seen several projects using some translation library (e.g. gettext) working with plain english placeholders. So for example: _(Please enter your nam

This question already has an answer here: Plain English explanation of Big O 21 answers I really can't figure out what Big-O is and how to use it in practice, so i hope someone could give me

So I have this problem to do and I am not really sure where to start: Using the definition of Big-O, prove the following: T(n) = 2n + 3 ∈ O(n) T(n) = 5n + 1 ∈ O(n2) T(n) = 4n2 + 2n + 3 ∈ O(n2) if an

I am trying to find a good explanation to quickly understand Big O and Theta theory. I always feel an explanation can be given in a million different ways, and I guess I'm seeking that one explanation

I have a quite simple problem, however I am a bit unsure of what the actual runtime (e.g. Big-O) is of this. The program looks like this. n <- user input for i=1 to n foo(i) foo a: for i=1 to a foo

Possible Duplicate: Big Theta Notation - what exactly does big Theta represent? I understand it in theory, I guess, but what I'm having trouble grasping is the application of the three. In school, w

What would be the Big O or Theta of a loop that runs forever? Just curious, was thinking about it today. Could you even bound it?

I'm trying to brush up on my big o calculations. If I have function that shifts all of the items to the right of 'i' 2 spaces I have a formula that looks something like: (n -1) + (n - 2) + (n - 3) ..

I have the following question: Is the following statement true or false? All logs to base 2 log2n is a member of O(log(n)) My attempt: log2n - clogn <= 0 log2 + logn - clogn <= 0 1 + logn(1-c)

I'm revising the formal definitions of Big O and the other associated bounds and something is tripping me up. In the book I'm reading (Skiena) Big O is defined as: f(n) = O(g(n)) when there exists a c

Yeah the title says it :-) What do they mean in plain English language? I really don't understand the explanation on Apple's website and I need to rename my target and I'm afraid that nothing works af

I know that T(n) = T(n/2) + θ(1) can be result to O(Log N) and my book said this is a case of Binary Search. But, how do you know that? Is it just by the fact that Binary Search cuts the problem in ha

Today, I read some articles about Covariance, Contravariance (and Invariance) in Java. I read the English and German wikipedia article, and some other blog posts and articles from IBM. But I'm still a

I am taking now the big O in ICS202 course, and I really find some dificulty to figure it out from a code, Is there any videos,web pages or blogs that can help me with that?

What is the performance in Big-O notation of the following algorithm? It's a function I wrote to print all permutations of a string. I know for an input of length n there are n! different permutations

Is anybody able to give a 'plain english' intuitive, yet formal, explanation of what makes QuickSort n log n? From my understanding it has to make a pass over n items, and it does this log n times...I

I've been told the below code is = O(MN) however, I come up with O(N^2). Which is the correct answer and why? My thought process: nested for loops plus if statements --> (O(N^2)+O(1)) + (O(N^2)+O(

Is O(5n) = 5*O(n) ? From what I understand , O(5n) == O(n). Thus they are not equal? Please correct me if I am wrong.

I am having trouble understanding this time complexity O(sqrt(B)) given that B is an integer. For example if I have a function... int GetResult(int A, int B) { } ...and this function has a time compl

Is the time complexity of the Oracle MAX function O(1), O(log n) or O(n) with respect to the number of rows in a table?

Could someone explain the Broker pattern to me in plain english? Possibly in terms of Java or a real life analogy.

sum = 0; for (i=0;i<n/2;i++) for (j=i; j<n/4; j++) sum++; What is the big O for the above code? I calculated the big O but I'm not sure if it's correct. This is my answer the outer loop will r

I am doing a question which asks to find the complexity of a nested for loop simplified using big O notation. The question is: for i <- 1 to n do for j <- 1 to n do for k <- 1 to (i+j) do a u

I missed the class where big-O was introduced thinking that it was pretty straight forward. It still seems to be however the teacher said something about O(n) deviating from the function when n gets v

I know in Big O Notation we only consider the highest order, leading polynomial term because we are basically placing this theoretic worst case bound on compute-time complexity but sometimes I get con

Possible Duplicate: Plain English explanation of Big O I was recently asked about my knowledge of how to use Big O notation and I was stumped because I had never come across Big O before. I have rea

Most people with a degree in CS will certainly know what Big O stands for. It helps us to measure how (in)efficient an algorithm really is and if you know in what category the problem you are trying t

I'm confused about how to do big-O analysis for the following problem - find an element from an array of integers. ( an example problem) my solution sort the array using bubble sort ( n^2 ) binary se

For 3-way Quicksort (dual-pivot quicksort), how would I go about finding the Big-O bound? Could anyone show me how to derive it? Thank you!

I have a couple of questions regarding some algebra using big O notation: if f(n)=O(g(n)) is log(f(n)) = O(log(g(n)))? is N^{f(n)}=O(N^{g(n)})? (where N is any real number)

I really want to know real definition. I have tried to read a book but couldn't understood. O : Big-O notation worst case. Θ : Theta notation average case. Ω : Omega notation best case. Q1> But why

What is the use of Big-O notation in computer science if it doesn't give all the information needed? For example, if one algorithm runs at 1000n and one at n, it is true that they are both O(n). But I

I have been given some code to work out big O runtimes on them, could someone tell me if I am on the right track or not? //program1 int i, count = 0, n = 20000; for(i = 0; i < n * n; i++) { count++

I have been searching for a few days now, but I cannot find a big-O notation algorithm for encrypting, decrypting, or attempting to break an encrypted file (brute force) making use of public key encry

What is the difference between Big-O notation (O(n)) and Little-O notation (o(n))?

Possible Duplicate: Cost of len() function Does len() iterate over the objects in a list and then return their count? Thus giving it a O(n). Or.... Does a python list keep a count of any objects tha

Anyone know what the amortized analysis is of keySet in Java HashMap? O(1)? Is iterating through them O(n)?

This is sort of a contrived example, but I'm trying to get at a general principle here. Given phrases written in English using this list-like form: I have a cat I have a cat and a dog I have a cat, a

If x is an n-bit integer. What is the size (in bits) of x2? I think the answer is O(n); is that correct? The way I thought about it is adding a number to itself that number amount of times means that

I am trying to find the Big O for stooge sort. From Wikipedia algorithm stoogesort(array L, i = 0, j = length(L)-1) if L[j] < L[i] then L[i] ↔ L[j] if j - i > 1 then t = (j - i + 1)/3 stoogesort

I checked http://en.wikipedia.org/wiki/Priority_queue it said Naive implementations is o(n). If I use binary search, it will be log(n). But I am not sure if it is used in Java. And how do I use binary

I have a question regarding time complexity (big O notation) for Java software. Is there a way to quickly calculate or test it (or any website that could calculate it for me would be welcomed). For ex

If we construct sorted set based on RB Tree and Heap. Do insertion() and deleteMax() for n times. (1). What's the Big-O ? My idea: For both RB tree and heap , deleteMax() and insertion() will all tak

I was having problem with the following question Consider the following nested loop construct. Categorize its efficiency in terms of the variable n using big-o notation. Suppose the statements repre

Trying to do Big O analysis- What is the average case for this program?. O(n/2(n/2)) = O(n^2) ?.. /* Returns the largest integer in the array */ int CompareToAll(int array[], int n) { int i, j; bool

why is the statement true : Log2(n) is O(log3(n)) I don't understand this, does big O not mean upper bond of something?. Isn't log2(n) bigger than log3(n) ? When I graph them, log2(n) is above log3(n)