Skip to content

Implementation of paging in function name_usage() #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Mar 10, 2018
Merged

Implementation of paging in function name_usage() #291

merged 27 commits into from
Mar 10, 2018

Conversation

damianooldoni
Copy link
Collaborator

Description

Inspired by paging functionality in occurrence related functions (e.g. occ.search()), we from INBO have implemented paging in the checklist related function name_usage(). It is in fact quite useful to download all species in a dataset at once. To do this, we tried to mantain the same structure and code style. Default limit (limit = 1000) has been mantained as well.

In order to check the validity of parameter limit as performed in occ_search(), I moved check_vals() to file zzz.R. In this way it can be used by both functions: name_usage() and occ_search().
Moreover, there is a discrepancy among the default values of start parameter between name_usage() and occ_search(). We opted to mantain the default value start = NULL as it is now. Probably it would be better to change it to start = 0 as in occ_*** functions.

Related Issue

This PR has been started and should partly tackle the issue #281
The extension of paging along name_*** can follows after this Pull Request.

Please note that the command:

args <- rgbif_compact(list(offset = start, limit = limit,
                             sourceId = sourceId)

has been changed to:

args <- rgbif_compact(list(offset = start, limit = limit,
                             sourceId = sourceId,
                             datasetKey = datasetKey))

in order to solve issue #290. Otherwise I couldn't perform the final check after syncing my fork to upstream/master.

Example

# Name usage for all taxa in a dataset (set sufficient high limit)
test <- name_usage(datasetKey = "9ff7d317-609b-4c08-bd86-3bc404b77c42", limit = 4000)

returns a list of two: $meta and $data where: test$data has 2676 rows, test$meta$offset = 2000, test$meta$limit = 676 and test$meta$endOfRecords = T

On the other hand, the code here below:

# Name usage for all taxa in a dataset (set sufficient high limit)
test <- name_usage(datasetKey = "9ff7d317-609b-4c08-bd86-3bc404b77c42", limit = 1843)

returns a list of two: $meta and $data where: test$data has 1843 rows, test$meta$offset = 1000, test$meta$limit = 843 and test$meta$endOfRecords = F.

While developing paging I was erroneously thinking that start = NULL one of the reasons why my code did'nt work.  I checked it again and it seems that it doesn't affect paging at all! And that's nice.
After testing paging functionality by means of additional file and function name_usage_test, I import the changes into name_usage.
I also (tried to) remove any code style changes in order to not interfere with scott coding style.
start = NULL would be removed by rgbif_compact function. It is ok for normal usage but it is a problem while paging: it owuld produce following error:
Error in urltools::url_encode(x[[i]]) :
  Not compatible with STRSXP: [type=NULL].
Moreover, NULL + numeric is not like 0 + numeric.
I use splice function in name_usage from purrr package in order to put all data in chunks together
check_vals, originally used ONLY in occ_search(), is now used in name_usage() too. So, better to move it to file zzz.R in order to not have duplicates.
Update branch from upstream/master before submitting PR for implmentation of paging in name_usage() function.
Add {} within if else. Indentation added as well.
@sckott
Copy link
Contributor

sckott commented Mar 7, 2018

thanks! i'll have a look

R/name_usage.r Outdated
# each of these args must be length=1
if (!is.null(rank)) stopifnot(length(rank) == 1)
if (!is.null(name)) stopifnot(length(name) == 1)
if (!is.null(language)) stopifnot(length(language) == 1)
if (!is.null(datasetKey)) stopifnot(length(datasetKey) == 1)

args <- rgbif_compact(list(offset = start, limit = limit,
sourceId = sourceId))
sourceId = sourceId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now there are no repeated parameters that GBIF allows on this route, can you delete the args <- c(args, rank, datasetKey, name, language) line and put each of name, rank, and language into the rgbif_compact(list(...)) call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 4fee1b2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And similarly: c51d7c3

R/name_usage.r Outdated
outout[[iter]] <- tt
}
out <- list()
out$results <- purrr::map(outout, ~purrr::splice(.$results))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't want to add a new pkg dependency unless there is very good reason to do so. Is the purrrr code doing something like lapply(outout, "[[", "results") ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. e88243a.
i copied your way of merging .results in occ_search. It works perfectly. So, purrr is removed.

R/name_usage.r Outdated
numreturned <- length(tt$results)}
sumreturned <- sumreturned + numreturned
# if less results than maximum
if ((numreturned > 0) & (numreturned < 1000)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we want a && here instead of & since it doesn't look like you're looking for a vectorized output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here: 4a7b004

@@ -232,3 +232,42 @@ test_that("fails with more than 1", {
"length\\(rank\\) == 1 is not TRUE")
})


# paging
test_that("paging: class data and meta not modified by paging", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much for adding tests! Though can you do tests that don't take as much time as these? Before these new tests the test for this file take about 10 sec, and after take about 30 sec

of course once we have test caching (coming soon via vcr pkg nearly to cran) then the long running time won't matter so much since most of time is in the http request which will be cached

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It takes too much time.
Unfortunately it's impossible to make it shorter because gbif takes chunks of 1000 and takes time for doing it.
I have commented all tests about paging (a6742f9). I added the suggestion to uncomment them after test caching will be implemented via vcr package.

@sckott
Copy link
Contributor

sckott commented Mar 8, 2018

  • feel free to change defaul value of start to 0, if you can update the docs too that would be great

offset is always in args, see line 95.
There is no good reason to introduce purrr package.
numreturned is not a vector. && is better.
Testing paging takes a lot of time. I commented them. Uncomment after introducing test caching via vcr package.
Example was updated in .R file but I forgot devtools::document().
@damianooldoni
Copy link
Collaborator Author

Thanks @sckott for your fast answer! Very appreciated.
I replied inline.
About the default start: I changed it to 0 here: bbc2342. Documentation updated as well here: cafa3d5.
Let me know if I need to change something more. Thanks.

@sckott
Copy link
Contributor

sckott commented Mar 10, 2018

thanks, looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants