Discussion about this post

User's avatar
Alexandre Passos's avatar

I love the revised cifar plot and I think about it a lot.

One dumb phenomenological way of thinking about curves like that is assume we can predict the pass rate of a model from two factors: an intrinsic model capability and an intrinsic problem difficulty. Then if you assume a dumb P(model on problem) = sigmoid(capability - difficulty) and if you approximate the sigmoid with a linear function you should get this type of behavior where as you look at the ensemble of models on a fixed problem set you'll see a line, and switching problem sets the line will have different slopes all meeting at 100% accuracy. This doesn't explain why the revised cifar is reliably harder than the original cifar, however. But it does explain behavior I've seen in LLMs where broadly speaking a large number of unrelated benchmarks evaluated over many models have a PCA of surprisingly low dimension, and so you're better off picking a small number of metrics to look at and ignoring the rest.

I think this is how IQ was first defined?

That said I still cannot explain why the new line is lower than the old line.

Expand full comment
Tom Dietterich's avatar

In ecology, our model often has two parts: an observation model and a biological model. For example, in species distribution models, we want to know whether a particular location (described by a vector of "habitat covariates") is occupied by a particular species. This could be viewed as a simple classification problem: f(habitat) = P(occupied | habitat)}. However, our observations are made by humans who visit the location and spend some amount of effort looking to see if the site is occupied. The probability that they will detect the species given that the site is occupied depends on a set of observation covariates that may include some habitat covariates (density of shrubs) as well as covariates for effort and weather (and possibly, degree of observer skill). g(obscovariates) = P(detection | site is occupied). The likelihood function is therefore something like P(detection | site is occupied) * P(occupied | habitat). This is known as the Occupancy Model, and we need to estimate the parameters of both f and g from the data. This estimation is quite delicate, because there are trivial solutions (e.g, all sites are occupied, and all negative observations are due to low detection probability; or detection probability is 1.0 and all negative observations are due to bad habitat).

Two questions: First, is it useful to view this as an extension of your "design-based ML" to include a measurement model? Second, I suspect that most ML analyses should include an explicit measurement model. We are accustomed to just dumping all of the covariates into a system and estimating h(obscovariate, habitatcovariates), but this loses the causal structure of the observation process.

Expand full comment
2 more comments...

No posts

Ready for more?