-
Notifications
You must be signed in to change notification settings - Fork 50
Implementation of paging in function name_usage() #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While developing paging I was erroneously thinking that start = NULL one of the reasons why my code did'nt work. I checked it again and it seems that it doesn't affect paging at all! And that's nice.
After testing paging functionality by means of additional file and function name_usage_test, I import the changes into name_usage. I also (tried to) remove any code style changes in order to not interfere with scott coding style.
start = NULL would be removed by rgbif_compact function. It is ok for normal usage but it is a problem while paging: it owuld produce following error: Error in urltools::url_encode(x[[i]]) : Not compatible with STRSXP: [type=NULL]. Moreover, NULL + numeric is not like 0 + numeric.
I use splice function in name_usage from purrr package in order to put all data in chunks together
check_vals, originally used ONLY in occ_search(), is now used in name_usage() too. So, better to move it to file zzz.R in order to not have duplicates.
Update branch from upstream/master before submitting PR for implmentation of paging in name_usage() function.
Add {} within if else. Indentation added as well.
|
thanks! i'll have a look |
R/name_usage.r
Outdated
| # each of these args must be length=1 | ||
| if (!is.null(rank)) stopifnot(length(rank) == 1) | ||
| if (!is.null(name)) stopifnot(length(name) == 1) | ||
| if (!is.null(language)) stopifnot(length(language) == 1) | ||
| if (!is.null(datasetKey)) stopifnot(length(datasetKey) == 1) | ||
|
|
||
| args <- rgbif_compact(list(offset = start, limit = limit, | ||
| sourceId = sourceId)) | ||
| sourceId = sourceId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now there are no repeated parameters that GBIF allows on this route, can you delete the args <- c(args, rank, datasetKey, name, language) line and put each of name, rank, and language into the rgbif_compact(list(...)) call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. 4fee1b2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And similarly: c51d7c3
R/name_usage.r
Outdated
| outout[[iter]] <- tt | ||
| } | ||
| out <- list() | ||
| out$results <- purrr::map(outout, ~purrr::splice(.$results)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't want to add a new pkg dependency unless there is very good reason to do so. Is the purrrr code doing something like lapply(outout, "[[", "results") ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. e88243a.
i copied your way of merging .results in occ_search. It works perfectly. So, purrr is removed.
R/name_usage.r
Outdated
| numreturned <- length(tt$results)} | ||
| sumreturned <- sumreturned + numreturned | ||
| # if less results than maximum | ||
| if ((numreturned > 0) & (numreturned < 1000)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we want a && here instead of & since it doesn't look like you're looking for a vectorized output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here: 4a7b004
tests/testthat/test-name_usage.r
Outdated
| @@ -232,3 +232,42 @@ test_that("fails with more than 1", { | |||
| "length\\(rank\\) == 1 is not TRUE") | |||
| }) | |||
|
|
|||
|
|
|||
| # paging | |||
| test_that("paging: class data and meta not modified by paging", { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks much for adding tests! Though can you do tests that don't take as much time as these? Before these new tests the test for this file take about 10 sec, and after take about 30 sec
of course once we have test caching (coming soon via vcr pkg nearly to cran) then the long running time won't matter so much since most of time is in the http request which will be cached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It takes too much time.
Unfortunately it's impossible to make it shorter because gbif takes chunks of 1000 and takes time for doing it.
I have commented all tests about paging (a6742f9). I added the suggestion to uncomment them after test caching will be implemented via vcr package.
|
offset is always in args, see line 95.
There is no good reason to introduce purrr package.
numreturned is not a vector. && is better.
Testing paging takes a lot of time. I commented them. Uncomment after introducing test caching via vcr package.
Example was updated in .R file but I forgot devtools::document().
|
thanks, looks good. |
Description
Inspired by paging functionality in occurrence related functions (e.g.
occ.search()), we from INBO have implemented paging in the checklist related functionname_usage(). It is in fact quite useful to download all species in a dataset at once. To do this, we tried to mantain the same structure and code style. Default limit (limit = 1000) has been mantained as well.In order to check the validity of parameter
limitas performed inocc_search(), I movedcheck_vals()to filezzz.R. In this way it can be used by both functions:name_usage()andocc_search().Moreover, there is a discrepancy among the default values of
startparameter betweenname_usage()andocc_search(). We opted to mantain the default valuestart = NULLas it is now. Probably it would be better to change it tostart = 0as inocc_***functions.Related Issue
This PR has been started and should partly tackle the issue #281
The extension of paging along
name_***can follows after this Pull Request.Please note that the command:
has been changed to:
in order to solve issue #290. Otherwise I couldn't perform the final check after syncing my fork to
upstream/master.Example
returns a list of two:
$metaand$datawhere:test$datahas 2676 rows,test$meta$offset = 2000,test$meta$limit = 676andtest$meta$endOfRecords = TOn the other hand, the code here below:
returns a list of two:
$metaand$datawhere:test$datahas 1843 rows,test$meta$offset = 1000,test$meta$limit = 843andtest$meta$endOfRecords = F.