Feed on
Posts
Comments

I recently needed to fit the extensible worm-like chain (EWLC) model to several force-extension curves from  some of my single molecule experiments. In these experiments, a DNA/RNA hybrid was stretched using optical tweezers and I wished to estimate the persistence length, the contour length and the stretch modulus.

As the EWLC is an implicit function, this requires implicit fitting. I have implemented this in iGor Pro and  will provide the code below so that you don’t have to waste time implementing it yourself.

It is assumed that you have the extension data in a wave called “[prefix]_y_real_dist” and the force data in a wave called “[prefix]_y_force”. If not, update the code to match your wave names.

First the actual implicit function, it takes three arguments:

  1. w: wave of parameters for the EWLC:
    w[0] = kB
    w[1] = T
    w[2] = Lp
    w[3] = Stretch modulus
    w[4] = Contour length
    w[5] = Extension offset (hold at zero if you know you have zero initial extension)
  2. x: Extension of tether (not a wave, a single value)
  3. y: Force at extension (not a wave, a single value)


Function FitEWLC(w,x,y): FitFunc
Wave w
Variable x
Variable y

return (w[0]*w[1])/w[2] * ((1/(4*(1-(x+w[5])/w[4] + y/w[3])^2)) - 1/4 + (x+w[5])/w[4] - y/w[3]) - y
End

For performing the actual fitting I have written a small function, DoFitEWLC, which lets you control a few things about the fitting.

DoFitEWLC takes four arguments:

  1. prefix: I use this because I have many waves with similar names except for the prefix. The prefix is a string.
  2. forceLimit: If you don’t want to fit your entire dataset, enter the upper limit of the force
  3. offset: you guess of the initial extension
  4. holdStr: If you want to hold any of the EWLC parameters, you probably want to, enter the hold string here. Example, if you want to hold the boltzmann constant and temperature the holdStr should be “110000″


Function DoFitEWLC(prefix, forceLimit, offset, holdStr)
//--
String prefix
Variable forceLimit
Variable offset
String holdStr
//--
// Declare wave names with source data
String real_dist
String force

sprintf real_dist, "%s_y_real_dist" prefix
sprintf force, "%s_y_force" prefix

wave w_real_dist = $real_dist
wave w_force = $force

if (!WaveExists(w_real_dist))
DoAlert/T="DataError" 0, "Extension:\rUnable to locate wave for trap1 Y-distance data\rLooked for wave named: \""+real_dist + "\""
return -1
endif

if (!WaveExists(w_force))
DoAlert/T="DataError" 0, "Force:\rUnable to locate wave for trap1 Y-force data\rLooked for wave named: \""+force + "\""
return -1
endif

// Find the range in which we want to fit
Variable n = numpnts(w_force)
Variable i = 0
Variable curr_F = 0

do
curr_F = w_force[i]
i += 1

// If limit is higher than the highest force - use full wave
if (i > n)
i = n
break
endif

while (curr_F < forceLimit)

// Declare waves for fit
String ewlcFitY
String ewlcFitX

sprintf ewlcFitY, "%s_EWLC_Y" prefix
sprintf ewlcFitX, "%s_EWLC_X" prefix

wave w_ewlcFitY = $ewlcFitY
wave w_ewlcFitX = $ewlcFitX

// Declare waves to hold temporary force and disance
String tmp_real_dist = "EWLC_tmp_dist"
String tmp_force = "EWLC_tmp_force"

Wave w_tmp_real_dist = $tmp_real_dist
Wave w_tmp_force = $tmp_force

// Make waves to hold fit values
Duplicate/D/O/R=(0,i) w_real_dist, $ewlcFitX, $ewlcFitY

// Copy the desired range for force and extension
Duplicate/D/O/R=(0,i) w_real_dist, $tmp_real_dist
Duplicate/D/O/R=(0,i) w_force, $tmp_force

// Make wave for parameters
Make/D/O ewlcCoefs={(1.38e-23*1e12*1e9),293,50,1000,300,offset}

FuncFit/ODR=3/H=holdStr FitEWLC, ewlcCoefs /X={$tmp_real_dist,$tmp_force}/XD={$ewlcFitX,$ewlcFitY}

End

The initial guesses for the fitting parameters are:
kB: 1.38e-23 J/K
T: 293 K
Lp: 50 nm
K: 1000 pN
Lc: 300 nm

So, does it work? I simulated data for a tether with a persistence length of 50 nm, a contour length of 300 nm and a stretch modulus of 1000 pN. The force data is in wave “NT_y_force” and extension data is in wave “NT_y_real_dist”. Issuing the following command

DoFitEWLC("NT",30,0,"110001")

Will produce the following output which shows that all parameters were estimated accurately:


Fit converged properly
ewlcCoefs={0.0138,293,50,1000,300,0}
V_chisq= 4.75695e-06;V_npnts= 5987;V_numNaNs= 0;V_numINFs= 0;
W_sigma={0,0,6.07e-05,0.00142,1.4e-05,0}
Coefficient values ± one standard deviation
w_0 =0.0138 ± 0
w_1 =293 ± 0
w_2 =50 ± 6.07e-05
w_3 =1000 ± 0.00142
w_4 =300 ± 1.4e-05
w_5 =0 ± 0

Please note that the holdStr limits the number of free parameters to three and that we only fit at forces up to 30pN.

The figure below shows the simulated data (black) and the fitted data (red).
EWLC_fit

The tuesday boy problem

An interesting “problem” was proposed by Gary Foshee and it goes as follows:

“I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?”

Well, what is the answer?

The answer is thirteen divided by twenty seven and not 1/3 as the most obvious answer would be.

As with other funny stuff, like the Monty Hall problem, I like to make simple simulations to verify the answer.

So, in Python v2.5 I simulated 3 x 40,000,000 childbirths (3 x 20,000,000 births of two childs) and counted the number of times one of the two was a boy born on a Tuesday. If a boy was born on a Tuesday, I counted the number of times the other child was also a boy.

The result is clear, the probability of the other child being a boy was found to be 0.48159 +/- 0.00036 (mean +/- SD). This quite close to (and not statistically different from) the expected value of 13/27 = 0.48148

UPDATE

Decision Science News has a cool graphical explanation for the problem – they also arrive at 13/27

Python code:

1 import random
2
3 boy_tuesday = 0
4 two_boys = 0
5
6 for i in range(1,20000000):
7
8   gender1 = random.randint(0,1)
9   day1 = random.randint(0,6)
10  gender2 = random.randint(0,1)
11  day2 = random.randint(0,6)
12
13
14  if ((gender1 == 0 and day1 == 1) or (gender2 == 0 and day2 == 1)):
15    boy_tuesday = boy_tuesday + 1
16
17  if (gender2 == 0 and gender1 == 0):
18    two_boys = two_boys + 1

During the development of the restriction “engine” of DiffCutter I have had the opportunity to compare the fragments produced by DiffCutter to those produced by DNA Strider. During this process I came across what I believe to be a bug in DNA Strider. I don’t know if it is a well know bug, however, if you are not aware of it you could end up in problems. The problem arises when sites for enzymes that cut outside the restriction sequence overlap in different directions.

For example, consider AlwI which has the following restriction sequence GGATC(N)4. If you hace a sequence  like this “GGATCC” you see that  there are two AlwI sites in oppersite directions and that the last site, “GATCC” ,will cut the sequence before “GGATC”.  This is not properly identified by Strider, and if you construct a sequence of any sequence (with out AlwI sites) and insert “GGATCC” somewhere in the middle, Strider will produce two fragments  - which is correct. However, Strider will tell you that the two fragments have a length of 12 bp and -14 bp – this is not correct.

I have not tested all possible sequences, but I expect the bug to affect all sequences for enzymes that cut outside their recognition sequence when the sites are overlapping (or just close enough) in different directions.

This is caused by the fact that Strider uses the first base of the restriction site as the cleavage-site. It does not account for the fact that many enzymes cut further inside the sequence or even outside of it.

DiffCutter will not make this mistake…

One of the master students in our lab, Kim Rønne, used DiffCutter last week to investigate if he had obtained the desired plasmid from a number of transformants. The restriction analysis was made difficult by the fact that the desired plasmid was very similar to the plasmid initially used. If fact, the two plasmids differs only by three nucleotides, which does not provide a unique restriction site. Of course, the three nucleotides create an additional restriction site (otherwise restriction analysis was not the way to go) but this site was for an enzyme that already had a lot of sites in the plasmid. Thus, digestion with two enzymes was the only way to get a clear distinction between the two plasmids.

Checking all combinations of two enzymes can be quite tedious when done by hand which is why DiffCutter was used. A combination of two enzymes, TasI and SspI, was identified as the best option, although the distinction would require a very nice agarose gel as the difference was quite small.

The image below shows the simulated gel created by DiffCutter (left), the real gel (middle) and some of the important features that DiffCutter was able to predict.

DiffCutter versus real gel

DiffCutter versus real gel

From the image above, one can see that DiffCutter was one the right track (the gel thickness of the simulated gel and the real gel differs slightly). The band that enables the plasmids to be differentiated are encircled (red) on the right figure. This band is clearly visible on the real gel as well. Additionally, a thick band (two overlapping fragments) were also identified by DiffCutter (blue circle).

Overall, the simulated gel has captured pretty much all of the features of the real gel – except for the absolute migration (keep in mind that the thickness of the simulated gel differs from that of the real gel).

So… Use DiffCutter :-)

DiffCutter – 0.9.1.0b

Yet again a new version of DiffCutter have been released.

Updates from version 0.9.0.07b:

  • New design
  • New algorithm for calculation of scores (faster and better than the previous version)
  • Improved error-handling
  • New FAQ page where the most frequent questions are submitted

Bugs fixed from version 0.9.0.07b:

  • If the program is unable to complete the operation within the time-frame allowed it will now terminate more elegantly than in the previous version which simply hang

To use the new version of DiffCutter please follow this link

For the next version of DiffCutter (not online yet) I have been trying to improve the speed at which the score for a given set of fragments is calculated. For dual-cutting this speed is very important as scores must be calculated for each combination of two enzymes (potentially about 45,000 times). In order to achieve relative short response times the score must therefore be calculated quite fast.

So, now I have made some improvements on the scoring algorithm – including writing some function on my own that actually exists within the PHP language itself. I’m quite surprised by the improvements I was able to achieve by using my own functions…

In order to benchmark the new algorithm I made a little test on my laptop. I simulated 2,000 score-calculations three times for each length of the input vector. I increased the length of the input vector in increments of 1 until a length of 100 was reached. This means that I now have 100 average values based on 3x 2,000 score-calculations. I also made a control where the input vector was of a constant length.

The result of this benchmark is shown below:

Benchmark old and new algorithm

Benchmark old and new algorithm

It is clear from the image above, that the new algorithm scales much better than the old algorithm does. The new algorithm scales linearly with the number of DNA fragments and it is only about 13 times slower when 100 fragments are passed compared to when 2 fragments are passed.
The old algorithm scales much worse and is about 60 times slower when 100 fragments are passed compared to when 2 fragments are passed.

The figure above doesn’t really tell anything about which algorithm is fastest, so to show this I have compared the absolute speed of the old algorithm versus the new algorithm. The figure below shows how much faster the new algorithm is compared to the old one.

Speed comparison between the old and the new algorithm

Speed comparison between the old and the new algorithm

The figure above shows that the new algorithm is faster than the old one for all conditions tested. For few fragments the new algorithm is about 2.5 times faster than the old one. However, as the new algorithm scales much better than the old one, the new algorithm is about 13 times faster than the old one when 80 fragments are passed (I stopped the test after 80 fragments as the old algorithm was getting really slow).

Hopefully I will be able to improve the performance even more, but I’m quite satisfied with the improvements made until now…

A few days ago I ran into a little problem – is Proline and amino acid or an imino acid? There is no doubt that proline differs from the other 19 “common” alpha amino acids used in proteins, but in my view it is an amino acid non the less. Proline contains a secondary amino group while the other 19 contain a primary amino group.

Looking at the definition for the two possibilities: imino or secondary amino acid there is little doubt as to which group Proline falls into (in my view).

Imino group: >C=NH Secondary amino group: R-NH-R’

Non the less, one can find many places where Proline is called an imino acid in stead of an amino acid. Up until now, nobody have been able to explain to me why Proline should be an imino acid and until someone can do that I will continue to call it an amino acid…

Feel free to correct me – but I do want an explanation :)

My own little theory is that the name imino sticks to Proline from the good old days and that the error has propagated through the literature. On PubChem Proline is also listed as an imino acid, but after communicating with their support I learned that they call in an imino acid because some literature calls it an imino acid…

Today DiffCutter got its first version number : 0.9.0.07b.

No updates have been made to DiffCutter itself except that it now shows the version number and the time for the update.

Behind the scenes a system has been made that allows easy “porting” from development version to live version. This system will automatically update the version number according to a chosen versioning scheme and take a backup of the old system. Creating this system was necessary (and perhaps a bid overdue) in order for me to avoid developing on the live site – as I have done up until now – Sorry

With regard to the versioning scheme I would like to say that I am not a computer scientist so I took the liberty of making my own simple system :) I guess that the postfix “b” is redundant given that beta is indicated by having a version number below 1.0 :) What can I say? I guess that I can version my own software however I like. Maybe I just have an “affinity for disobedience” – to quote a very smart (and possibly very real) person

DiffCutter – Updated!

Yes, the moment you have all been waiting for has arrived – DiffCutter has been updated.

Apart from the design (which is now fashionably green-ish) the following updates have been made on the underlying code:

  • Algorithm for selection of gel thickness for the virtual gel have been improved
  • The digestion engine has been updated to allow practically any type of restriction sequence. Basically for now this means that non-palindromic cutters are supported, but more exotic enzymes such as BplI is also supported
  • Support for multiple enzyme-schemes (i.e. New England Biolabs, Fermentas, ect.). This way you can choose the enzymes that you use in Your lab
  • Support for dual digestion, i.e. digest with two enzymes at once
  • A “print view”-type option have been implemented that allows for easy printing of each digestion reaction
  • The “print view” contains a mass list that assists in choosing the correct amount of DNA to digest

That was all for this time…

Being a guy that on a regular basis need to perform restriction analysis in order to verify e.g. the identity of a plasmid, selecting a relevant restriction enzyme is important. Often this task is quite simple but at other times this might not be the case…

 

In order to make the selection of an appropriate enzyme faster (and somewhat easier) I wrote a small program that selects a subset of enzymes to choose from. The subset of enzymes returned by the program are selected to give a clear difference between the two passed sequences.

An example of the use of this program: I made a number of clonings where a small part of the same plasmid was changed. In two of these clones the altered sequence had the same length and almost the same sequence. I wanted to digest the two plasmids with an enzyme so that I could distinguish between the two. As no unique sites was introduced in either sequence, selecting an appropriate enzyme could have been a labour intensive task.
Giving the two sequences to DiffCutter the program identified several enzymes that produced a distinct fragment pattern in each sequence. One of these enzymes, HinP1I, produced 50+ fragments for each sequence. Normally this many fragments is problematic as they can produce a smear when separating them on a gel. However, the program identified a distinct band at approximately 500 bp that could be used to tell the difference between the two sequences. The simulated gel is shown below:

 

Simulated gel

Simulated gel using HinP1I

Despite the large number of fragments in each lane, the fragment at around 500 bp was easy to see. 

On a more general note, the output from DiffCutter looks something like this (for each enzyme):

 

Output from DiffCutter

Output from DiffCutter

 

 

So, how to use DiffCutter – it easy. Take the two sequences that you want to differentiate, press “digest” and use one of the the suggested enzymes.

The program is free to use and can be found here: DiffCutter – enjoy…

Older Posts »