Monte Carlo Club: Difference between revisions

Revision as of 22:25, 2 April 2017

(This Week) March 31, 2017 "Drift-Mutation Balance" (Due 4/7/2017)

We will explore genetic drift of a DNA fragment with mutation under Wright-Fisher model. From last week's exercise, we conclude that a population will sooner or later lose genetic diversity, if no new alleles are generated (by e.g., mutation or migration).

Mutation, in contrast, increases genetic diversity over time. Under neutrality (no natural selection against or for any mutation, e.g., on an intron sequence), the population will reach an equilibrium point when the loss of genetic diversity by drift is cancelled out by increase of genetic diversity by mutation.

You job is to find this equilibrium point by simulation, given a population size (N) and a mutation rate (mu). The expected answer is pi=2N*mu, where pi is a measure of genetic diversity using DNA sequences, which is the average pairwise sequence differences within a population.

A suggested algorithm is:

Start with a homogeneous population with N=100 identical DNA sequences (e.g., with a length of L=1e4 bases, or about 5 genes) [R hint: dna <- sample(c("a", "t", "c", "g"), size=1e5, replace=T, prob = rep(0.25,4))]
Write a mutation function, which will mutate the DNA based on Poisson process (since mutation is a rare event). For example, if mu=1e-4 per generation per base per individual (too high for real, but faster for results to converge), then each generation the expected number of mutations would be L * mu = 1 per individual per generation for our DNA segment. You would then simulate the random number of mutations by using the R function rpois(lambda=1).
Apply the mutation function for each individual DNA copy (a total of N=100) during gamete production (100 gametes for each individual) at each generation (for a total of G=1000 generations).
For each generation, instead of counting allele frequencies (as last week's problem), you would need another function to calculate & output average pairwise differences among the 100 individuals.
Finally, you would graph pi over generation.

March 18, 2017 "Genetic Drift" (Due 3/31/2017)

This is our first biological simulation. Mandatory assignment for all lab members (from interns to doctoral students). An expected result is shown in the graph.

Task: Simulate the Wright-Fisher model of genetic drift as follows:

Begin with an allele frequency p=0.5, and a pop of N=100 haploid individuals [Hint: pop <- c(rep("A", 50), rep("a", 50))]
Each individual produces 100 gametes, giving a total of 10,000 gametes [Hint: use a for loop with rep() function]
Sample from the gamete pool another 100 to give rise to a new generation of individuals [Hint: use sample() function]
Calculate allele frequency [Hint: use table() function]
Repeat the above in succession for a total generation of g=1000 generations [Hint: create a function with three arguments, e.g., wright.fisher(pop.size, gamete.size, generation.time)]
Plot allele frequency changes over generation time
Be prepared to answer these questions:
1. Why allele frequency fluctuate even without natural selection?
2. What's the final fate of population, one allele left, or two alleles coexist indefinitely?
3. Which population can maintain genetic polymorphism (with two alleles) longer?
4. Which population gets fixed (only one allele remains) quicker?

Submitted Codes

By John

Python

import numpy as np
import sys
from random import sample
import matplotlib.pyplot as plt
 
def simulator(gametes_rate, next_individuals, generations):
    individuals = [1 for i in range(50)] + [0 for j in range(50)]
    frequency = []
    for generation in range(generations):
        gametes = []
        for individual in individuals:
            gametes += [individual for i in range(gametes_rate)]
        individuals = sample(gametes, next_individuals)
        frequency.append(np.count_nonzero(individuals) / len(individuals))
    return(frequency)
 
N_100 = simulator(100, 100, 1000)
N_1000 = simulator(100, 1000, 1000)
 
# Create plots with pre-defined labels.
# Alternatively,pass labels explicitly when calling `legend`.
fig, ax = plt.subplots()
ax.plot(N_100, 'r', label='N=100')
ax.plot(N_1000, 'b', label='N=1000')
 
# Add x, y labels and title
plt.ylim(-0.1, 1.1)
plt.xlabel("Generation")
plt.ylabel("Frequency")
plt.title("Wright-Fisher Model")
 
# Now add the legend with some customizations.
legend = ax.legend(loc='upper right', shadow=True)
 
# The frame is matplotlib.patches.Rectangle instance surrounding the legend.
frame = legend.get_frame()
frame.set_facecolor('0.90')
 
# Set the fontsize
for label in legend.get_texts():
    label.set_fontsize('large')
 
for label in legend.get_lines():
    label.set_linewidth(1.5)  # the legend line width
plt.show()

#Scala
import scala.util.Random
import scala.collection.mutable.ListBuffer
 
val gamete_rate = 100
val offspr_rate = 100
val generations = 1000
 
var frequency: List[Double] = List()
 
var individuals = List.fill(50)(0) ++ List.fill(50)(1)
 
for(generation <- 1 to generations){
  var gametes = individuals.map(x => List.fill(gamete_rate)(x)).flatten
  individuals = Random.shuffle(gametes).take(offspr_rate)
  frequency = frequency :+ individuals.count(_ == 1).toDouble / offspr_rate
}
 
print(frequency)

#Spark
val gamete_rate = 100
val offspr_rate = 100
val generations = 300
 
var frequency: List[Double] = List()
var individuals = sc.parallelize(List.fill(50)("0") ++ List.fill(50)("1"))
 
for(generation <- 1 to generations){
  val gametes = individuals.flatMap(x => (x * gamete_rate).split("").tail)
  individuals = sc.parallelize(gametes.takeSample(false, offspr_rate))
  val count = individuals.countByValue
  frequency = frequency :+ count("1").toDouble / offspr_rate
}
 
print(frequency)

By Brian

Wright<-function(pop.size,gam.size,generation) {
 if( pop.size%%2==0) {
 
  pop<-c(rep("A",pop.size*.5),rep("a",pop.size))
  frequency<-.5
  time<-0
  geneticDrift<-data.frame(frequency,time)
  for (i in 1:generation) {
    largePop<-rep(pop,gam.size)
    samplePop<-sample(largePop,100)
    cases<-table(samplePop)
    if (cases[1]<100 && cases[2]<100 ) 
        {propA<-(cases[1]/100)
        geneticDrift<-rbind(geneticDrift,c(propA,i))
        pop<-c(rep("A",cases[2]),rep("a",cases[1]))
  }  
}
  plot(geneticDrift$frequency~geneticDrift$time,type="b", main="Genetic Drift N=1000", xlab="time",ylab="Proportion Pop A",col="red", pch=18 )
}
 if(pop.size%%2==1 ) {
   print("Initial Population should be even. Try again")
 }
}
Wright(1000,100,1000)

By Jamila

Genetic_code = function(t,R){ 
  N<- 100
  p<- 0.5
  frequency<-as.numeric();
  for (i in 1:t){
    A1=rbiom(1,2*N,p)
    p=A1/(N*2); 
    frequency[length(frequency)+1]<-p; 
  }
  plot(frequency, type="1",ylim=c(0,1),col=3,xlab="t",ylab=expression(p(A[1])))
  
  }

By Sharon

p=0.5
N=100
g=0
pop <- c(rep("A", 50), rep("a", 50)) #create a population of 2 alleles
data<-data.frame(g,p) #a set of variables of the same # of rows 
for (i in 1:1000) { #generation 
  gam_pl <-rep(pop, 100) #gamete pool
  gam_sam <-sample(gam_pl, 100)  #sample from gamete pool
  tab_all<-table(gam_sam) #table ps sample
  all_freq<-tab_all[1]/100 #get allele frequency
  data<-rbind(data, c(i,all_freq))
  pop<-c(rep("A",tab_all[1]))
}
plot(data$g,data$p)

By Sipa

pop <- c(rep("A", 50), rep("a", 50))
wright_fisher <- function(N,gen_time,gam) {
  N=N
  gen_time = gen_time
  x = numeric(gen_time) 
  x[1] = gam
  for (i in 2:1000) {
  k=(x[i-1])/N
  n=seq(0,N,1)
  prob=dbinom(n,N,k)
  x[i]=sample(0:N, 1, prob=prob)
  
}

plot(x[1:gen_time], type="l", pch=10)

}
pool <- rep(pop, times = 100)
s_pool <- sample(pool, size = 100, replace = F)
table(s_pool)
wright_fisher2 <- function(pop_size,al_freq,gen_time) {
  pop <- c(rep("A", pop_size/2), rep("a", pop_size/2))
  pool <-rep(pop, times=100)
  s_pool <- sample(pool, size = 100, replace = T)
  for(i in 1:gen_time)
    s_pool <- sample(sample(pool, size = pop_size, replace = F))
  a.f <- table(s_pool)[1]/100
  return(a.f)
  }

By Nicolette

N=100 
p=0.5 
g=1000 
sim=100 
Genetic_drift=array(0, dim=c(g,sim))
Genetic_drift[1,]=rep(N*p,sim)
for(i in 1:sim) {
  for(j in 2:g){
    X[j,i]=rbinom(1,N,prob=X[j-1,i]/N)
  }
}
Genetic_drift=data.frame(X/N)
matplot(1:1000, (X/N), type="l",ylab="allele_frequency",xlab="generations")

March 11, 2017 "Stoplights" (Due 3/18/2017)

Source: Paul Nahin (2008). "Digital Dice", Problem 18.
Challenge: How many red lights, on average, will you have to wait for on your journey from a city block m streets and n avenues away from Belfer [with coordinates (1,1)]? (assuming equal probability for red and green lights)
Note that one has only wait for green light when walking along either the north side of 69 Street or east side of 1st Avenue. On all other intersections, one can walk non-stop without waiting for green light by crossing in the other direction if a red light is on.
Formulate your solution with the following steps:

Start from (m+1,n+1) corner and end at (1,1) corner
The average number of red lights for m=n=0 is zero
Find the average number of red lights for m=n=1 by simulating the walk 100 times
Increment m & n by 1 (but keep m=n), until m=n=1000
Plot average number of red lights by m (or n).

Submitted Codes

By John

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame
from random import sample
from itertools import combinations, permutations
from numpy import count_nonzero, array
 
def red_light(av, st):
    traffic_light = ["red_st", "red_av"]
    total_cross = av + st - 1
    while av != 0 and st != 0:
        if sample(traffic_light, 1)[0] == "red_st":
            av -= 1
        else:
            st -= 1
    rest_cross = av if av != 0 else st
    return(count_nonzero([sample([0, 1], 1) for corss in range(rest_cross)]))
 
df = DataFrame(0, index=np.arange(10), columns=np.arange(10))
 
simulation = 1000
for av in df.index:
    for st in df.index:
        df.loc[av, st] = sum(array([[red_light(av, st)] for n in range(simulation)])) / simulation
        
plt.imshow(df, cmap='hot', interpolation='nearest')
plt.xticks(np.arange(1, 10))
plt.yticks(np.arange(1, 10))
plt.title("Average Numbers Waiting for Red Light")
plt.xlabel("Number of Av")
plt.ylabel("Number of St")
plt.show()
 
df
 
# Recursive Function for Each Trial
 
av = 5
st = 5
traffic_light = ["red_av", "red_st"]
def route(av, st):
    while not av == st == 0:
        if sample(traffic_light, 1)[0] == "red_st":
            if av == 0:
                print("wait at Av. {0} St. {1}".format(av, st))
                return(route(av, st - 1))
            else:
                print("Proceed Av. {0}".format(av - 1))
                return(route(av - 1, st))
        else:
            if st == 0:
                print("Wait at Av. {0} St. {1}".format(av, st))
                return(route(av - 1, st))
            else:
                print("Proceed St. {0}".format(st - 1))
                return(route(av, st - 1))
route(av, st)

By Weigang

#!/usr/bin/env perl
use strict;
use warnings;

# Use recursion
my ($distx, $disty) = @ARGV;
my $num_red = 0; # keep red-light counts
print "-" x $distx, "\n"; # print a starting line
&walk($distx, $disty, \$num_red); # pass reference not value
exit;

sub walk {
    my ($x, $y, $ref) = @_;
    my $ct = $$ref;
    my $prob_red;
    if ($x == 1 && $y == 1) { # Reached destination
	print "*\n";
	print "-" x $distx, "\n"; # print the ending line
	print "reached Belfer after waiting for ", $ct, " red lights\n";
    }

    if ($x == 1 && $y > 1) { # Reached right-side end
	$prob_red = rand();
	if ($prob_red < 0.5) { # red light, wait
	    $ct++;
	    print "w\n", " " x ($distx-1);
	} else {
	    print "|\n", " " x ($distx-1);
	}
	&walk($x, $y-1, \$ct);
    }

    if ($x > 1 && $y == 1) { # Reached bottom
	$prob_red = rand();
	if ($prob_red < 0.5) { # red light, wait
	    $ct++;
	    print "w";
	} else { print "-"}
	&walk($x-1, $y, \$ct);
    }

    if ($x > 1 && $y > 1) {
	my $prob_across = rand(); # prob of walking right with green light
	if ($prob_across >= 0.5) { # move one block right
	    print "-";
	    &walk($x-1, $y, \$ct);
	} else { # red light, move one block down
	    print "|\n", " " x ($distx-$x);
	    &walk($x, $y-1, \$ct);
	}
    }
}

By Jeff

# In[26]:
def walk (walker):
    import random
    
    def just_walk (x):                                  #[1]
        if random.choice(["red_m_direction", "green_m"])=="green_m": x["loca"]["m"] -=1
        else: x["loca"]["n"] -=1
        return x    
    def may_have_to_wait (dist,waited):                 #[2]
        if random.choice(["green","red"])=="red": waited += 1
        else: dist -= 1
        return (dist,waited)    

    while (0 not in walker["loca"].values()):         #start walking
        walker = just_walk(walker) 
    while (walker["loca"]["m"] !=0):       # if n =0 and m != 0
        (walker["loca"]["m"], walker["waited"]) = may_have_to_wait(walker["loca"]["m"],walker["waited"]) 
    while (walker["loca"]["n"] !=0):       # if m =0 and n != 0
        (walker["loca"]["n"], walker["waited"]) = may_have_to_wait(walker["loca"]["n"],walker["waited"]) 

    return walker["waited"]

def given_distance(m,n,rep): 
    waited = list()
    for x in range(rep):
        walker={"loca":{"m":m,"n":n}, "waited":0}     #walker variable
        waited.append(walk(walker))      
    return sum(waited)/len(waited)
# main
result=list()
for d in range(1000):                                  # set walking distance here
        result.append(given_distance (d,d,100))      # set rep here
import matplotlib.pyplot as plt                     # ploting 
plt.plot(result); plt.show()  

#[1] no reason to wait for ANY red before one of the distances (m or n) is exhausted. Assuming when light is green for m, it must be red for n, vise versa. 
#[2] when one of the directions (m or n = 0), waiting cannot be avoided, start counting.

March 3, 2017 "PiE" (Due 3/10/2017)

Source: Paul Nahin (2008). "Dueling Idiots", Problem 5.
Challenge: obtain numerical values of Pi and E by simulations
Simulate pi by

Randomly generate 10,000 pairs of uniformly distributed numbers from 0 and 1 (simulating throwing darts onto the unit square shown at right)
Count the number of points enclosed within the quarter-circle
Calculate the pi value from this proportion

Simulate e by

Generate N random numbers from 0 to 1
Divide into N equal-width bins between 0 and 1
Count the number of bins Z that receive none of the random numbers
Obtain e ~ N/Z (based on binomial sampling formula)
Simulate N=1e2, 1e3, and 1e4

Submitted Codes

By Weigang

# simulate pi:
darts <- sapply(1:1000, function(x) { coords<- runif(2); return(ifelse(coords[1]^2+coords[2]^2 <= 1, 1,0)) })
pi <- 4*mean(darts)

# simulate e
N <- 1e3;
n <- runif(N);
cts <- numeric();
for(i in 1:(N-1)) {
  left <- i/N;
  right <- (i+1)/N;
  cts[i] <- length(which(n>=left & n < right))
}
e <- N/length(which(cts==0))

By Nicolette

#Simulate pi
random <- 0; square <- 0;
for (i in 1:1000){
  random[[i]]<- runif(1,0,1)
square[[i]]<- sqrt(1-(random[i])^2)
}
plot(random, square)
areaofqc <- (pi/4)
ranleqc <- length(which(random <=areaofqc))
squleqc <- length(which(square<=areaofqc))

#Simulate e
e <- 0;
for(N in 1:1000) {
  numbers<-runif(N)
bin.size<-1/N
  non.empty<-as.integer(numbers/bin.size)
  z.empt<- N - length(table(non.empty))
  e<-c(e, N/z.empt)
  }
plot(e)

By Brian

#When you throw a dart at the a unit square dartboard the Probability of hitting the portion of the circle radius with center=(0,0) inside the unit square is  pi/4.  We can say that hitting the 1/4 circle within the unit square with a dart is a bernoulli random variable where p=pi/4. Further, we can say E(bernoulli=hit)=pi/4. 
# Imagine you are scoring the game of darts as follows- 1 point if you throw in the 1/4 circle and 0 points if you miss this space. 
# If you throw 10000 darts you just did 10000 iid bernoulli trials. By the law of large numbers if we count the number of hits to the 1/4 circle of radius 1 and divide by number of darts thrown we will get something pretty close to E(beroulli) . Multiply that number by 4 and you have an estimate of Pi.  

## Generate 10,000 pairs of points in the unit square with uniform distribution

x<-c(runif(1000000, min = 0, max = 1))
y<-c(runif(1000000, min = 0, max = 1))
point<-data.frame(x,y)

plot(point$x,point$y)
point.sub<-subset(point,y<=sqrt(1-x^2))

plot(point.sub$x,point.sub$y)

z<-4*nrow(point.sub)/1000000
z
error<-pi-z
error

## simulating exp
#It's a well that a binomial with large n and tiny p is a good estimate of the poisson random For this estimate lambda=np.  In this case n=10,000 and p=the probability of falling into a particular unit=1/10000....... 
 simexp<-function (n) {
a<-c(runif(n,min=0,max=1))
b<-c()

for (i in 1:n) {
  c<-subset(a,(i-1)/n<a & a<=i/n)
  d<-length(c)
  b<-append(b,d)
}
bzero<-subset(b,b==0)
length(bzero)
print(n/length(bzero))
}
simexp(1000)
simexp(10000)
simexp(100000)

By John

#python code
# Simulate pi
simulation = 10000
distances = np.array([pow(uniform(0, 1), 2) + pow(uniform(0, 1), 2) for i in range(simulation)])
pi = count_nonzero(distances < 1) / simulation * 4
pi

# Simulate e
total = 10000
ranges = [[x/total, (x+1)/total] for x in range(total)]
sample_space = [uniform(0, 1) for x in range(total)]
index = []
for i in range(total):
    if any(ranges[i][0] <= point <= ranges[i][1] for point in sample_space):
        index.append(i)
e = total / (total - len(set(index)))
e

Feb 24, 2017 "Idiots" (Due 3/3/2017)

Source: Paul Nahin (2000), "Dueling Idiots". Problem #2: "When Idiots Duel"
Game: Idiots A and B decide to duel with a gun in the following way: They will insert a single bullet into the gun's cylinder. The cylinder has a total of 6 slots. Idiot A will spin the cylinder and shoot at B. If the gun doesn't fire, then A will give the gun to B, who will spin the cylinder and then shoot at A. This back-and-forth duel will continue until one fool shoots (and kills) the other.
Questions: (1) what is the probability that A will win (and B dies); (2) What is the average number of trigger pulls before someone dies?

Submitted Codes

By Roy

gun <-c(1,0,0,0,0,0)
gun2 <- c(1,0,0,0,0,0)
deadman <-0 ;idiot1win <-0; idiot2win <- 0; nooned <- 0; total <- 0
 while(deadman < 1000){
  idiot1shot <-sample(gun)
  if (length(which(idiot1shot[1] == 1))){
    deadman <- deadman + 1
    idiot1win <- idiot1win + 1
  }else{
      idiot2shot <-sample(gun2)
      if (length(which(idiot2shot[1] == 1))){
        deadman <- deadman +1
        idiot2win <- idiot2win +1
      } else {nooned <- nooned + 1}
  }
  total <- total + 1
  }


deadman
idiot1win
idiot2win
nooned
total

p <- idiot1win/1000
p*100
takes2kill <- deadman/total
takes2kill*100
idiot1shot
idiot2shot

By Weigang

rounds <- sapply(1:1000, function(x) {
  alive <- 1;
  round <- 0;
  while(alive == 1){
    spin <- sample(c(0,0,0,0,0,1));
    round <- round + 1;
    if(spin[1] == 1) { alive <- 0 }
  }
  return(round)
})
prob.a.live <- length(which(rounds %% 2 == 0))
barplot(table(rounds)/1000, xlab="num rounds", ylab="Prob", main = "Sudden Death with a 6-slot gun (sim=1000)", las=1)

Feb 17, 2017 "Birthday" (Due 2/24/2017)

Problem: What is the probability NONE of the N people in a room sharing a birthday?

Randomly select N individuals and record their B-days
Count the B-days NOT shared by ANY two individuals
Repeat (for each N) 1000 times, and obtain probability by averaging the previous counts (i.e., divided by 1000)
Vary N from 10 to 100, increment by 10
Plot probability of no-shared B-Day (Y-axis) versus N (x-axis), with either a stripchart or boxplot, or both

Submitted Codes

By Roy

N <- 0
samples <- 0;
prob.nodub <-0;
for(j in 10:100){
  counting.no.dups <- 0;
  test <- for(i in 1:1000){
  bdays <- sample(seq(as.Date('1990/01/01'), as.Date('1990/12/31'), by="day"), N, replace=T)
  dups <- duplicated(bdays, incomparables = FALSE)
  ch <-length(which(dups == TRUE))
    if(ch==0){
      counting.no.dups <- counting.no.dups +1
    }
    fine <- (counting.no.dups/1000)
  }
  print(N)
  print(fine)
  N <- N + 1
  samples[[j]] <- N
  prob.nodub[[j]] <- fine
}
plot(samples,prob.nodub, main="Birthday Simulation", xlab = "Sample Size", ylab = "Probability of no Duplicates", las =1)

By weigang

days <- 1:365;

find.overlap <- function(x) { return(length(which(table(x)>1))) }

output <- sapply(1:100, function(x) { # num of people in the room
  ct.no.overlap <- 0;
  for (k in 1:100) {
    bdays <- sample(days, x, replace = T);
    ct <- find.overlap(bdays);
    if (!ct) { ct.no.overlap <- ct.no.overlap + 1}
  }
  return(ct.no.overlap);
})
 
plot(1:100, output/100, xlab="Group size", ylab = "Prob (no shared b-day)", las=1, main="B-day (sim=100 times)", type="l")
abline(h=0.5, lwd=2, col=2, lty=2)
abline(v=seq(0, 100, 5), col="gray")

Feb 10, 2017 "Dating" (Valentine's Day Special; Due 2/17/2017)

Source: Paul Nahin (2008), "Digital Dice". Problem #20: "An Optimal Stopping Problem"
Problem: What is the optimal time point when one should stop dating more people and settle on a mate choice (and live with the decision)
Your best strategy is to date an initial sample of N individuals, rejecting all, and marry the next one ranked higher than any of your N individuals. The question is what is the optimal number for N.

The problem could be investigated by simulating a pool of 10 individuals, ranked from 1-10 (most desirable being 1) and then take a sample of N
You may only date one individual at a time
You cannot go back to reach previously rejected candidates
Simulate N from 0 to 9 (zero means marrying the first date, a sample size of zero)
For each N, obtain the probability of finding the perfect mate (i.e., ranked 1st) by running simulation 1000 times
Plot barplot of probability versus sample size N.
Expected answer: N=4

Submitted Codes

By Weigang

pick.candidate <- function(min, array) {
  for (i in 1:length(array)) {
    if (array[i] < min) {
      return(array[i])
    } else {next}
  }
  return(0) # No 1. has been sampled and rejected
}
candidates <- 1:10;
output <- sapply(0:9, function(x) {
  ct <- 0;
  for(k in 1:1000) {
    if (x==0) { # no sample, marry the 1st guy 
      sampled <- sample(candidates, 1);
      if (sampled == 1) {ct <- ct+1}
    } else {
      sampled <- sample(candidates, x);
      not.sampled <- candidates[-sampled];
      not.sampled <- sample(not.sampled);
      if (pick.candidate(min =  min(sampled), array = not.sampled) == 1) {ct <- ct+1}
    }
  }
  return(ct);
})
barplot(output/1e3, names.arg = 0:9, xlab = "number of sampled dates", las=1, main = "Optimal stopping for dating (N=10 candidates)", ylab = "Prob(marrying No.1)")

Feb 3, 2017 "US Presidents" (Due 2/10/2017)

Download
File:Presidents.txt
: 1st column is the order, 2nd column is the name, the 3rd column is the year of inauguration; tab-separated
Your job is to create an R, Perl, or Python script called “us-presidents”, which will

Read the table
Store the original/correct order
Shuffle/permute the rows and record the new order
Count the number of matching orders
Repeat Steps 3-4 for a 1000 times
Plot histogram or barplot (better) to show distribution of matching counts
Hint: For R, use the sample() function. For Perl, use the rand() function.

Submitted Codes

By Mei

pres.list <- lapply(1:1000, function(x) pres[sample(nrow(pres)),])
cts <- sapply(pres.list, function(x) {
  ct.match <- 0;
  for (i in 1:45){
    if (pres$order[i] == x[i,1]){
      #cat(as.character(x[i,2]), x[i,1], "\n")  #as.character to avoid the factor info
      ct.match <- ct.match + 1;
      #cat(ct.match)
    }
  }
  ct.match #return
})
barplot(table(cts),xlab = "Number of matches per shuffle", border = "hotpink", col = "pink", ylab = paste("Frequency total of:",length(pres.list)), main = "US Presidents")

By John

import pandas as pd
import numpy as np
from pandas import DataFrame
import matplotlib.pyplot as plt

df = pd.read_table("presidents.txt", names=["num", "name", "presidency"])

name_list = list(df.name)

# create a list to store matches after each shuffle
shuffle_record = []

# create a function in which the first argument is the original dataset; second argument is number of shuffles
def shuffler2(original, n):
    record = []
    for i in range(n):
        num = 0
        each_shuffle = {}
        temp = original.reindex(np.random.permutation(original.index)) # do shuffling for each
        compare = original.num == temp.num
        matched_df = original.ix[compare[compare == True].index]
        for i in matched_df.index:
            each_shuffle[i] = matched_df.name[i]
        shuffle_record.append(each_shuffle)
        try:
            num = compare.value_counts()[1]
        except:
            pass
        record.append(num)
    return(record)
result2 = shuffler2(df, 1000)

plt.hist(result2, color="yellow")
plt.title("Histgram for President Data")
plt.show()

By Weigang

p <- read.table("Presidents.txt", sep="\t", header=F)
colnames(p) <- c("order", "name", "inaug.year")
# Use "sapply" or "lapply" for loops: no need to pre-define a vector to store results
p.sim <- sapply(1:10000, function (x) {
  length(which(sample(p$order) == p$order))
  }) 
barplot(table(p.sim)/1e4, las=1)
p.exp <- rpois(1e4, 1) # draw 10000 Poisson random deviates
mp <- barplot(table(p.exp)/1e4, las=1, xlab = "Num of matching presidents") # mid-point on x-axis
lines(mp[1:7], table(p.sim)/1e4, type="b", col=2) # add a line (Poisson-expected) to the barplot (simulated)
legend("topright", c("Simulated", "Poisson expectation"), col=1:2, lty=1)

Monte Carlo Club: Difference between revisions

Revision as of 22:25, 2 April 2017

Contents

(This Week) March 31, 2017 "Drift-Mutation Balance" (Due 4/7/2017)

March 18, 2017 "Genetic Drift" (Due 3/31/2017)

March 11, 2017 "Stoplights" (Due 3/18/2017)

March 3, 2017 "PiE" (Due 3/10/2017)

Feb 24, 2017 "Idiots" (Due 3/3/2017)

Feb 17, 2017 "Birthday" (Due 2/24/2017)

Feb 10, 2017 "Dating" (Valentine's Day Special; Due 2/17/2017)

Feb 3, 2017 "US Presidents" (Due 2/10/2017)

Navigation menu

Monte Carlo Club: Difference between revisions

Revision as of 22:25, 2 April 2017

(This Week) March 31, 2017 "Drift-Mutation Balance" (Due 4/7/2017)

March 18, 2017 "Genetic Drift" (Due 3/31/2017)

March 11, 2017 "Stoplights" (Due 3/18/2017)

March 3, 2017 "PiE" (Due 3/10/2017)

Feb 24, 2017 "Idiots" (Due 3/3/2017)

Feb 17, 2017 "Birthday" (Due 2/24/2017)

Feb 10, 2017 "Dating" (Valentine's Day Special; Due 2/17/2017)

Feb 3, 2017 "US Presidents" (Due 2/10/2017)

Navigation menu

Search