Purpose

In this file there is R Code in the form of an “R Notebook” (a .Rmd file), which illustrates how data transformation affects the density. We use an example of squaring uniformly distributed data. We plot both the theoretical density and the histogram of the original and transformed samples.

One purpose this file could serve is to verify that the theoretically derived density of \(Y=g(X)\) based on the formula \[ f_Y(y) = \sum_{x\in g^{-1}(y)} \frac{f_X(x)}{|g'(x)|} \] is correctly derived. If it is correct, there should be no significant discrepancy between the histogram and the theoretical density.

With some R skills (sampling and writing functions), you could reuse this Notebook to validate your own derivation.

The steps

Below we conduct a numerical experiment with the following steps:

Implementation

n=100000
nbreaks=200
data=runif(n,-1,2)
hist(data,breaks=nbreaks,main="Original Data")

g<-function(x)x^2
xformed=g(data)
hist(xformed,breaks=nbreaks,main="Transformed Data")

Plot transformed data over the distribution

f <- Vectorize(function(y) {
  if(y<=0) 0
  else if(y<1) 1/3/sqrt(y)
  else if(y<=4) 1/6/sqrt(y)
  else 0;
})

# Standalone plot of the density
plot(f,-1, 5,n=200,main="Transformed pdf")

p1=plot(f,-1, 5,n=200,col=rgb(1,0,0,1),main="pdf + histogram")
hist(xformed,freq=FALSE,add=TRUE,col=rgb(1,0,0,1/4))

NA
NA
NA
LS0tCnRpdGxlOiAiVmFsaWRhdGlvbiBvZiBhIHRyYW5zZm9ybWVkIHBkZiBpbiBSIgphdXRob3I6ICJNYXJlayBSeWNobGlrIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKLS0tCgpQdXJwb3NlCj09PT09PT0KCkluIHRoaXMgZmlsZSB0aGVyZSBpcyBSIENvZGUgaW4gdGhlIGZvcm0gb2YgYW4gIlIgTm90ZWJvb2siIChhIC5SbWQgZmlsZSksIHdoaWNoIGlsbHVzdHJhdGVzIGhvdyBkYXRhIHRyYW5zZm9ybWF0aW9uIGFmZmVjdHMgdGhlIGRlbnNpdHkuIFdlIHVzZSBhbiBleGFtcGxlIG9mIHNxdWFyaW5nIHVuaWZvcm1seSBkaXN0cmlidXRlZCBkYXRhLiBXZSBwbG90IGJvdGggdGhlIHRoZW9yZXRpY2FsIGRlbnNpdHkgYW5kIHRoZSBoaXN0b2dyYW0gb2YgdGhlIG9yaWdpbmFsIGFuZCB0cmFuc2Zvcm1lZCBzYW1wbGVzLgoKT25lIHB1cnBvc2UgdGhpcyBmaWxlIGNvdWxkIHNlcnZlIGlzIHRvIHZlcmlmeSB0aGF0IHRoZSB0aGVvcmV0aWNhbGx5IGRlcml2ZWQgZGVuc2l0eSBvZiAkWT1nKFgpJCBiYXNlZCBvbiB0aGUgZm9ybXVsYSAkJCBmX1koeSkgPSBcc3VtX3t4XGluIGdeey0xfSh5KX0gXGZyYWN7Zl9YKHgpfXt8ZycoeCl8fSAkJCBpcyBjb3JyZWN0bHkgZGVyaXZlZC4gSWYgaXQgaXMgY29ycmVjdCwgdGhlcmUgc2hvdWxkIGJlIG5vIHNpZ25pZmljYW50IGRpc2NyZXBhbmN5IGJldHdlZW4gdGhlIGhpc3RvZ3JhbSBhbmQgdGhlIHRoZW9yZXRpY2FsIGRlbnNpdHkuCgpXaXRoIHNvbWUgUiBza2lsbHMgKHNhbXBsaW5nIGFuZCB3cml0aW5nIGZ1bmN0aW9ucyksIHlvdSBjb3VsZCByZXVzZSB0aGlzIE5vdGVib29rIHRvIHZhbGlkYXRlIHlvdXIgb3duIGRlcml2YXRpb24uCgpUaGUgc3RlcHMKPT09PT09PT09CgpCZWxvdyB3ZSBjb25kdWN0IGEgbnVtZXJpY2FsIGV4cGVyaW1lbnQgd2l0aCB0aGUgZm9sbG93aW5nIHN0ZXBzOgoKLSAgIGdlbmVyYXRlIGEgdW5pZm9ybWx5IGRpc3RyaWJ1dGVkIHNhbXBsZSBvbiB0aGUgaW50ZXJ2YWwgJFstMSwyXSQ7IGNhbGwgdGhlIGdlbmVyYXRlZCBzYW1wbGUgJ2RhdGEnOwotICAgVHJhbnNmb3JtIHRoZSBkYXRhIHVzaW5nIHRoZSBtYXBwaW5nICR5PWcoeF4yKSQ7IGNhbGxlZCB0aGUgdHJhbnNmb3JtZWQgc2FtcGxlICd4Zm9ybWVkJzsKLSAgIFBsb3QgdGhlIGhpc3RvZ3JhbXMgb2YgdGhlIG9yaWdpbmFsIGRhdGEgJ2RhdGEnIGFuZCB0cmFuc2Zvcm1lZCBkYXRhICd4Zm9ybWVkJwoKSW1wbGVtZW50YXRpb24KPT09PT09PT09PT09PT0KCmBgYHtyfQpuPTEwMDAwMApuYnJlYWtzPTIwMApkYXRhPXJ1bmlmKG4sLTEsMikKaGlzdChkYXRhLGJyZWFrcz1uYnJlYWtzLG1haW49Ik9yaWdpbmFsIERhdGEiKQpnPC1mdW5jdGlvbih4KXheMgp4Zm9ybWVkPWcoZGF0YSkKaGlzdCh4Zm9ybWVkLGJyZWFrcz1uYnJlYWtzLG1haW49IlRyYW5zZm9ybWVkIERhdGEiKQpgYGAKClBsb3QgdHJhbnNmb3JtZWQgZGF0YSBvdmVyIHRoZSBkaXN0cmlidXRpb24KCmBgYHtyfQpmIDwtIFZlY3Rvcml6ZShmdW5jdGlvbih5KSB7CiAgaWYoeTw9MCkgMAogIGVsc2UgaWYoeTwxKSAxLzMvc3FydCh5KQogIGVsc2UgaWYoeTw9NCkgMS82L3NxcnQoeSkKICBlbHNlIDA7Cn0pCgojIFN0YW5kYWxvbmUgcGxvdCBvZiB0aGUgZGVuc2l0eQpwbG90KGYsLTEsIDUsbj0yMDAsbWFpbj0iVHJhbnNmb3JtZWQgcGRmIikKcDE9cGxvdChmLC0xLCA1LG49MjAwLGNvbD1yZ2IoMSwwLDAsMSksbWFpbj0iVHJhbnNmb3JtZWQgcGRmICsgaGlzdG9ncmFtIikKaGlzdCh4Zm9ybWVkLGZyZXE9RkFMU0UsYWRkPVRSVUUsY29sPXJnYigxLDAsMCwxLzQpKQoKCgpgYGAK