Permute 2 2.2.8

Permute 2.2.8 Mac破解版 发表于 2016 年 11 月 27 日 由 zwx0709 今天,小子要分享的是Mac下一款非常方便实用的视频格式转换软件——Permute,它可以帮助你轻松转换视频格式。. 2.2.8 The summarise Function - to do some calculations on the data 2.2.9 Two Other Useful Functions 2.2.10 Last but not least - Pipes (%% ) to make your code efficient.

Subsection2.2.1Ordering Things

A number of applications of the rule of products are of a specific type, and because of their frequent appearance they are given their own designation, permutations. Consider the following examples.

In each of the above examples of the rule of products we observe that:

  1. We are asked to order or arrange elements from a single set.

  2. Each element is listed exactly once in each list (permutation). So if there are (n) choices for position one in a list, there are (n - 1) choices for position two, (n - 2) choices for position three, etc.

We now develop notation that will be useful for permutation problems.

The first few factorials are

begin{equation*}begin{array}{ccccccccc}n & 0 & 1 & 2 & 3 & 4 & 5 & 6 &7 n! & 1 & 1 & 2 & 6 & 24 & 120 &720 & 5040 end{array}text{.}end{equation*}

Note that (4!) is 4 times (3!text{,}) or 24, and (5!) is 5 times (4!text{,}) or 120. In addition, note that as (n) grows in size, (n!) grows extremely quickly. For example, (11! = 39916800text{.}) If the answer to a problem happens to be (25!text{,}) as in the previous example, you would never be expected to write that number out completely. However, a problem with an answer of (frac{25!}{23!}) can be reduced to (25 cdot 24text{,}) or 600.

If (lvert A rvert = n text{,}) there are (n!) ways of permuting all (n) elements of (A) . We next consider the more general situation where we would like to permute (k) elements out of a set of (n) objects, where (k leq ntext{.})

It is important to note that the derivation of the permutation formula given above was done solely through the rule of products. This serves to reiterate our introductory remarks in this section that permutation problems are really rule-of-products problems. We close this section with several examples.

Previously we looked at how you can use functions to simplify yourcode. Ideally you have a function that performs a singleoperation, and now you want to use it many times to do the same operation onlots of different data. The naive way to do that would be something like this:

But this isn’t very nice. Yes, by using a function, you have reduceda substantial amount of repetition. That is nice. But there isstill repetition. Repeating yourself will cost you time, both now andlater, and potentially introduce some nasty bugs. When it comes torepetition, well, just don’t.

The nice way of repeating elements of code is to use a loop of somesort. A loop is a coding structure that reruns the same bit of codeover and over, but with only small fragments differing betweenruns. In R there is a whole family of looping functions, each withtheir own strengths.

First, it is good to recognise that most operations that involvelooping are instances of the split-apply-combine strategy (this termand idea comes from the prolific Hadley Wickham,who coined the term in thispaper). You start with abunch of data. Then you then Split it up into many smallerdatasets, Apply a function to each piece, and finally Combinethe results back together.

Some data arrives already in its pieces - e.g. output files from froma leaf scanner or temperature machine. Your job is then to analyseeach bit, and put them together into a larger data set.

Sometimes the combine phase means making a new data frame, other times it mightmean something more abstract, like combining a bunch of plots in a report.

Either way, the challenge for you is to identify the pieces that remain the samebetween different runs of your function, then structure your analysis aroundthat.

Ok, you got me, we are starting with for loops. But not in the way you think.

When you mention looping, many people immediately reach for for. Perhapsthat’s because, like me, they are already familiar with these other languages,like basic, python, perl, C, C++ or matlab. While for is definitely the mostflexible of the looping options, we suggest you avoid it wherever you can, forthe following two reasons:

  1. It is not very expressive, i.e. takes a lot of code to do what you want.
  2. It permits you to write horrible code, like this example from my earlierwork:

The main problems with this code are that

  • it is hard to read
  • all the variables are stored in the global scope, which is dangerous.

All it’s doing is making a plot! Compare that to something like this

That’s much nicer! It’s obvious what the loop does, and no new variables arecreated. Of course, for the code to work, we need to define the function

which actually makes our plot, but having all that detail off in afunction has many benefits. Most of all it makes your code morereliable and easier to read. Of course you could do this easilywith for loops too:

but the temptation with for loops is often to cram a little extracode in each iteration, rather than stepping back and thinking aboutwhat you’re trying to achieve.

So our reason for avoiding for loops, and the similar functionswhile and repeat, is that the other looping functions, likelapply, demand that you write nicer code, so that’s we’ll focus onfirst.

There are several related function in R which allow you to apply some functionto a series of objects (eg. vectors, matrices, dataframes or files). They include:

  • lapply
  • sapply
  • tapply
  • aggregate
  • mapply
  • apply.

Each repeats a function or operation on a series of elements, but theydiffer in the data types they accept and return. What they all incommon is that order of iteration is not important. This iscrucial. If each each iteration is independent, then you can cyclethrough them in whatever order you like. Generally, we argue that youshould only use the generic looping functions for, while, andrepeat when the order or operations is important. Otherwisereach for one of the apply tools.

lapply applies a function to each element of a list (or vector),collecting results in a list. sapply does the same, but will try tosimplify the output if possible.

Lists are a very powerful and flexible data structure that few people seem toknow about. Moreover, they are the building block for other data structures,like data.frame and matrix. To access elements of a list, you use thedouble square bracket, for example X[[4]] returns the fourth element of thelist X. If you don’t know what a list is, we suggest youread more about them,before you proceed.

Basic syntax

Here X is a list or vector, containing the elements that form the input to thefunction f. This code will also return a list, stored in result, with samenumber of elements as X.

Usage

lapply is great for building analysis pipelines, where you want to repeat aseries of steps on a large number of similar objects. The way to do this is tohave a series of lapply statements, with the output of one providing the input toanother:

The challenge is to identify the parts of your analysis that stay the same andthose that differ for each call of the function. The trick to using lapply isto recognise that only one item can differ between different function calls.

It is possible to pass in a bunch of additional arguments to your function, butthese must be the same for each call of your function. For example, let’s say wehave a function test which takes the path of a file, loads the data, and testsit against some hypothesised value H0. We can run the function on the file“myfile.csv” as follows.

We could then run the test on a bunch of files using lapply:

But notice, that in this example, the only this that differs between the runsis a single number in the file name. So we could save ourselves typing these byadding an extra step to generate the file names

The nice things about that piece of code is that it would extend as long as wewanted, to 10000000 files, if needed.

Example - plotting temperature for many sites using open weather data

Let’s look at the weather in some eastern Australian cities over thelast couple of days. The websiteopenweathermap.com provides access to allsorts of neat data, lots of it essentially real time. We’ve parcelledup some on the nicercode website to use. In theory, this sort ofanalysis script could use the weather data directly, but we don’t wantto hammer their website too badly. The code used to generate thesefiles is here.

We want to look at the temperatures over the last few days for the cities

The data are stored in a url scheme where the Sydney data is athttp://nicercode.github.io/guides/repeating-things/data/Sydney.csvand so on.

The URLs that we need are therefore:

We can write a function to download a file if it does not exist:

and then run that over the urls:

Notice that we never specify the order of which file is downloaded inwhich order; we just say “apply this function (download.maybe) tothis list of urls. We also pass the path argument to every functioncall. So it was as if we’d written

but much less boring, and scalable to more files.

The first column, time of each file is a string representing dateand time, which needs processing into R’s native time format (dealingwith times in R (or frankly, in any language) is a complete pain). Ina real case, there might be many steps involved in processing eachfile. We can make a function like this:

that reads in a file given a filename, and then apply that function toeach filename using lapply:

We now have a list, where each element is a data.frame ofweather data:

We can use lapply or sapply to easy ask the same question to eachelement of this list. For example, how many rows of data are there?

What is the hottest temperature recorded by city?

or, estimate the autocorrelation function for each set:

I find that for loops can be easier to plot data, partly becausethere is nothing to collect (or combine) at each iteration.

Parallelising your code

Another great feature of lapply is that is makes it really easy to paralleliseyour code. All computers now contain multiple CPUs, and these can all be put towork using the great multicore package.

In the case above, we had naturally “split” data; we had a vector ofcity names that led to a list of different data.frames of weatherdata. Sometimes the “split” operation depends on a factor. Forexample, you might have an experiment where you measured the size ofplants at different levels of added fertiliser - you then want to knowthe mean height as a function of this treatment.

However, we’re actiually going to use some data on ratings of seinfeld episodes, taken from the [Internet movie Database](http://www.reddit.com/r/dataisbeautiful/comments/1g7jw2/seinfeld_imdb_episode_ratings_oc/).

Columns are Season (number), Episode (number), Title (of theepisode), Rating (according to IMDb) and Votes (to construct therating).

Make sure it’s sorted sensibly

Biologically, this could be Site / Individual / ID / Mean size /Things measured.

Hypothesis: Seinfeld used to be funny, but got progressively lessgood as it became too mainstream. Or, does the mean episode ratingper season decrease?

Now, we want to calculate the average rating per season:

and so on until:

As with most things, we could automate this with a for loop:

That’s actually not that horrible to do. But we it could benicer. We first split the ratings by season:

Then use sapply to loop over this list, computing the mean

Then if we wanted to apply a different function (say, compute theper-season standard error) we could just do:

But there’s still repetition there. Let’s abstract that away a bit.

Permute 2 2.2.8 games

Suppose we want a: 1. response variable (like Rating was) 2. grouping variable (like Season was) 3. function to apply to each level

This just writes out exactly what we had before

We can compute the mean rating by season again:

which is the same as what we got before:

Of course, we’re not the first people to try this. This is exactlywhat the tapply function does (but with a few bells and whistles,especially around missing values, factor levels, additionalarguments and multiple grouping factors at once).

So using tapply, you can do all the above manipulation in asingle line.

There are a couple of limitations of tapply.

The first is that getting the season out of tapply is quitehard. We could do:

But that’s quite ugly, not least because it involves the conversionnumeric -> string -> numeric.

Better could be to use

But that requires knowing what is going on inside of tapply (thatunique levels are sorted and data are returned in that order).

I suspect that this approach:

is probably the most fool-proof, but it’s certainly not pretty.

However, the returned format is extremely flexible. If you do:

The aggregate function provides a simplfied interface to tapplythat avoids this issue. It has two interfaces: the first issimilar to what we used before, but the grouping variable now mustbe a list or data frame:

(note that dat['Season'] returns a one-column data frame). Thecolumn ‘x’ is our response variable, Rating, grouped by season. Wecan get its name included in the column names here by specifyingthe first argument as a data.frame too:

The other interface is the formula interface, that will be familiarfrom fitting linear models:

This interface is really nice; we can get the number of votes heretoo.

If you have multiple grouping variables, you can write things like:<div class=’bogus-wrapper’></div>

to apply a function to each pair of levels of factor1 and factor2.

This is great in Monte Carlo simulation situations. For example.Suppose that you flip a fair coin n times and count the number ofheads:

You can run the trial a bunch of times:

and get a feel for the results. If you want to replicate the trial100 times and look at the distribution of results, you could do:

and then you could plot these:

for” loops shine where the output of one iteration depends onthe result of the previous iteration.

Suppose you wanted to model random walk. Every time step, with 50%probability move left or right.

Start at position 0

Move left or right with probability p (0.5 = unbiased)

Update the position

Let’s abstract the update into a function:

Permute 2 2.2.8 Patch

Repeat a bunch of times:

Permute 2 2.2.8 Download

To find out where we got to after 20 steps:

If we want to collect where we’re up to at the same time:

Permute 2 2.2.8 Torrent

Pulling that into a function:

We can then do 30 random walks:

Of course, in this case, if we think in terms of vectors we canactually implement random walk using implicit vectorisation:

Permute 2 2.2.8 Mod

Which reinforces one of the advantages of thinking in terms offunctions: you can change the implementation detail without therest of the program changing.