-
Notifications
You must be signed in to change notification settings - Fork 14
Description
I think that @melsiddieg was correct to complain about the performance of RcppSimdJson::fload
. Something does not add up. Given how fast RcppSimdJson, it should be roughly as fast as curl:: curl_download
. But it is not!
> url<-"http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest/v4/hsapiens/feature/gene/TET1/snp?limit=200&skip=-1&skipCount=false&count=false&Output%20format=json&merge=false"
> res <- microbenchmark::microbenchmark(straight = curl::curl_download(url, tempfile()),
jsonlite = jsonlite::fromJSON(url),
simdjson = RcppSimdJson::fload(url),
times = 5L)
> print(res)
Unit: milliseconds
expr min lq mean median uq max neval
straight 567.2850 568.8655 595.1786 570.3718 580.1068 689.2641 5
jsonlite 714.3094 721.5960 733.4962 737.1303 744.9391 749.5061 5
simdjson 2498.4776 2616.6448 2620.8897 2629.1768 2641.9882 2718.1610 5
The file has 784 KB. You can replace tempfile()
by a file name and inspect it, you find that, indeed, curl is grabbing every little byte.
So it seems that RcppSimdJson could go faster by invoking curl_download and then parsing the resulting temporary file. It is also possible to load the file directly to memory (curl_fetch_memory) but I did not want to use that as a benchmark since you might argue (rightly so) that it might be cheating.
I am not 100% clear on why there is such a difference, but it does warrant investigation.