- A1 Veneto Workers History Our data come from the Veneto Workers History (VWH) file, which provides social security based earnings records on annual job spells for all workers employed in the Italian region of Veneto at any point between the years 1975 and 2001. Each job-year spell in the VWH lists a start date, an end date, the number of days worked that year, and the total wage compensation received by the employee in that year. The earnings records are not top-coded. We also observe the gender of each worker and several geographic variables indicating the location of each employer. See Card, Devicienti, and Maida (2014) and Serafinelli (2017) for additional discussion and analysis of the VWH.
Paper not yet in RePEc: Add citation now
Abowd, J. M., F. Kramarz, and D. N. Margolis (1999). High wage workers and high wage firms. Econometrica 67(2), 251–333.
Abowd, J. M., R. H. Creecy, F. Kramarz, et al. (2002). Computing person and firm effects using linked longitudinal employer-employee data. Technical report, Center for Economic Studies, US Census Bureau.
- Achlioptas, D. (2001). Database-friendly random projections. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 274–281. ACM.
Paper not yet in RePEc: Add citation now
Akritas, M. G. and N. Papadatos (2004). Heteroscedastic one-way anova and lack-of-fit tests. Journal of the American Statistical Association 99(466), 368–382.
Anatolyev, S. (2012). Inference in regression models with many regressors. Journal of Econometrics 170(2), 368–382.
Andrews, D. W. (1988). Chi-square diagnostic tests for econometric models: theory. Econometrica: Journal of the Econometric Society, 1419–1453.
Andrews, I. and A. Mikusheva (2016). A geometric approach to nonlinear econometric models.
Andrews, M. J., L. Gill, T. Schank, and R. Upward (2008). High wage workers and low wage firms: negative assortative matching or limited mobility bias? Journal of the Royal Statistical Society: Series A (Statistics in Society) 171(3), 673–697.
Angrist, J. D. (2014). The perils of peer effects. Labour Economics 30, 98–108.
Angrist, J., G. Imbens, and A. Krueger (1999). Jackknife instrumental variables estimation. Journal of Applied Econometrics 14(1), 57–67.
Arcidiacono, P., G. Foster, N. Goodpaster, and J. Kinsler (2012). Estimating spillovers using panel data, with an application to the classroom. Quantitative Economics 3(3), 421–470.
- Arellano, M. and S. Bonhomme (2011). Identifying distributional characteristics in random coefficients panel data models. The Review of Economic Studies 79(3), 987–1020.
Paper not yet in RePEc: Add citation now
- Arriaga, R. I. and S. Vempala (1999). An algorithmic theory of learning: Robust concepts and random projection. In Foundations of Computer Science, 1999. 40th Annual Symposium on, pp. 616–623. IEEE.
Paper not yet in RePEc: Add citation now
- Bonhomme, S. (2017). Econometric analysis of bipartite networks. Mimeo.
Paper not yet in RePEc: Add citation now
Bonhomme, S., T. Lamadon, and E. Manresa (2017a). Discretizing unobserved heterogeneity. Unpublished manuscript, University of Chicago.
Bonhomme, S., T. Lamadon, and E. Manresa (2017b). A distributional framework for matched employer employee data. Unpublished manuscript, University of Chicago.
BorovičkovaÃŒÂ, K. and R. Shimer (2017). High wage workers work for high wage firms. Technical report, National Bureau of Economic Research.
- Bryk, A. S. and S. W. Raudenbush (1992). Hierarchical linear models for social and behavioral research: Applications and data analysis methods.
Paper not yet in RePEc: Add citation now
Card, D., A. R. Cardoso, J. Heining, and P. Kline (2018). Firms and labor market inequality: Evidence and some theory. Journal of Labor Economics 36(S1), S13–S70.
Card, D., F. Devicienti, and A. Maida (2014). Rent-sharing, holdup, and wages: Evidence from matched panel data. The Review of Economic Studies 81(1), 84–111.
Card, D., J. Heining, and P. Kline (2013). Workplace heterogeneity and the rise of west german wage inequality. The Quarterly journal of economics 128(3), 967–1015.
- Cattaneo, M. D., M. Jansson, and W. K. Newey (2016). Alternative asymptotics and the partially linear model with many regressors. Econometric Theory, 1–25.
Paper not yet in RePEc: Add citation now
Cattaneo, M. D., M. Jansson, and W. K. Newey (2017). Inference in linear regression models with many covariates and heteroskedasticity. Journal of the American Statistical Association (justaccepted) .
Chao, J. C., J. A. Hausman, W. K. Newey, N. R. Swanson, and T. Woutersen (2014). Testing overidentifying restrictions with many instruments and heteroskedasticity. Journal of Econometrics 178, 15–21.
Chao, J. C., N. R. Swanson, J. A. Hausman, W. K. Newey, and T. Woutersen (2012). Asymptotic distribution of jive in a heteroskedastic iv regression with many instruments. Econometric Theory 28(01), 42–86.
- Chatterjee, S. (2008). A new method of normal approximation. The Annals of Probability 36(4), 1584–1610.
Paper not yet in RePEc: Add citation now
Chetty, R., J. N. Friedman, N. Hilger, E. Saez, D. W. Schanzenbach, and D. Yagan (2011). How does your kindergarten classroom affect your earnings? evidence from project star. The Quarterly Journal of Economics 126(4), 1593–1660.
- Chung, F. R. (1997). Spectral graph theory. Number 92. American Mathematical Soc.
Paper not yet in RePEc: Add citation now
- Cochran, W. G. (1980). Fisher and the analysis of variance. In RA Fisher: An Appreciation, pp. 17–34. Springer.
Paper not yet in RePEc: Add citation now
- Computing ≈ 234, 000 firm effects and ≈ 2, 200, 000 worker effects took approximately 11 seconds with the CMG solver on a 64 core machine with 256 GB of dedicated RAM. By contrast, using the method suggested in Card et al. (2013) took approximately 34 seconds. Algorithm 2 Fast Approximation of Statistical Leverages 1: function leverage(x,) 2: Let p = 24 log k 2 . 3: Construct Q as a random 1/ √ p Rademacher matrix of dimensions p × n. 4: Compute Υ = QX. 5: Let ξκ denote the κ’th row of Υ. 6: for κ = 1, . . . p do 7: Solve the system: L̇z̃κ = ξ0 κ 8: end for 9: Build Z̃ = (z̃0 1, . . . , z̃0 p) 10: Approximate each Pii as: ||Z̃(eg − eN+j(g,t))||2 11: end function Our procedure also requires computation of Bii = x0 iS−1 xx AS−1 xx xi. When A is used to estimate variance components, then we can rewrite Bii as Bii = (eg − eN+j(g,t)) 0
Paper not yet in RePEc: Add citation now
- Condition (ii): The conditions of Lemma 1 in Andrews and Mikusheva (2016) are satisfied. To verify this, take the manifold S̃ = n ẋ ∈ Rq+1 : g̃(x) = 0 o for g̃(ẋ) = ẋ0 Σ̂1/2 q Dq 0 0 0 # Σ̂1/2 q ẋ + (2E[b̂q]0 , 1) Dq 0 0 1 # Σ̂1/2 q ẋ.
Paper not yet in RePEc: Add citation now
Davidson, R. and J. G. MacKinnon (1993). Estimation and inference in econometrics.
Dhaene, G. and K. Jochmans (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies 82(3), 991–1030.
Donald, S. G., G. W. Imbens, and W. K. Newey (2003). Empirical likelihood estimation and consistent tests with conditional moment restrictions. Journal of Econometrics 117(1), 55–93.
- Drineas, P., M. Magdon, M. W. Mahoney, and D. P. Woodruff (2012). Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research 13(Dec), 3475–3506.
Paper not yet in RePEc: Add citation now
Dufour, J.-M. and J. Jasiak (2001). Finite sample limited information inference methods for structural equations and models with generated regressors. International Economic Review 42(3), 815–844.
- Efron, B. and C. Stein (1981, 05). The jackknife estimate of variance. Ann. Statist. 9(3), 586–596.
Paper not yet in RePEc: Add citation now
- Fast computation of these scores is an active area of research in computer science (see, e.g., the discussion in Drineas et al., 2012). We use recent advances in this area to illustrate how these scores can be computed efficiently in two-way fixed effects models. Without loss of generality, we can write the model of Section 7.2 as yi = x0 iβ + εi where xi = (d0 i, − ˙ f0 i)0 , ˙ fi = (1j(g,t)=0, . . . , 1j(g,t)=J )0 , β = (α0 , −È0 )0 and È = (È0, . . . , ÈJ ). It is easy to verify that in this case Sxx = L̇ where L̇ is the weighted Laplacian associated with the bipartite graph G formed by workers and firms. This implies that: Pii = x0 iS†xxxi = L̇†g,g + L̇†N+j(g,t),N+j(g,t) − 2L̇†g,N+j(g,t) = (eg − eN+j(g,t)) 0 L̇†(eg − eN+j(g,t)) = (eg − eN+j(g,t)) 0
Paper not yet in RePEc: Add citation now
Finkelstein, A., M. Gentzkow, and H. Williams (2016). Sources of geographic variation in health care: Evidence from patient migration. The Quarterly Journal of Economics 131(4), 1681–1726.
- Fisher, R. A. (1925). Statistical methods for research workers. Genesis Publishing Pvt Ltd.
Paper not yet in RePEc: Add citation now
Graham, B. S. (2008). Identifying social interactions through conditional variance restrictions. Econometrica 76(3), 643–660.
Graham, B. S. and J. L. Powell (2012). Identification and estimation of average partial effects in irregular correlated random coefficient panel data models. Econometrica 80(5), 2105–2152.
Graham, B. S., J. Hahn, A. Poirier, and J. L. Powell (2016). A quantile correlated random coefficients panel data model. Technical report, cemmap working paper, Centre for Microdata Methods and Practice.
Hahn, J. and W. Newey (2004). Jackknife and analytical bias reduction for nonlinear panel models.
- Hildreth, C. and J. P. Houck (1968). Some estimators for a linear model with random coefficients. Journal of the American Statistical Association 63(322), 584–595.
Paper not yet in RePEc: Add citation now
- Horn, S. D., R. A. Horn, and D. B. Duncan (1975). Estimating heteroscedastic variances in linear models. Journal of the American Statistical Association 70(350), 380–385.
Paper not yet in RePEc: Add citation now
- Istat (2001). Il sistema produttivo del veneto. Technical report.
Paper not yet in RePEc: Add citation now
Jochmans, K. and M. Weidner (2016). Fixed-effect regressions on network data. arXiv preprint arXiv:1608.01532.
- Karoui, N. E. and E. Purdom (2016). Can we trust the bootstrap in high-dimension? arXiv preprint arXiv:1608.00696.
Paper not yet in RePEc: Add citation now
- Koutis, I., G. L. Miller, and D. Tolliver (2011). Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing. Computer Vision and Image Understanding 115(12), 1638–1646.
Paper not yet in RePEc: Add citation now
- Kuh, E. (1959). The validity of cross-sectionally estimated behavior equations in time series applications. Econometrica: Journal of the Econometric Society, 197–214.
Paper not yet in RePEc: Add citation now
- L̇†AL̇†(eg − eN+j(g,t)) = ||A1/2 L̇†(eg − eN+j(g,t))||2 Hence, one can approximate Bii via a simple modification of Algorithm 2. When A is used to estimate covariance components, we can rewrite A as in Lemma 3, that is A = A0 1A2. Note that a simple corollary of the JLL is that inner products are preserved under random projections (see for instance Corollary 2 in Arriaga and Vempala, 1999).
Paper not yet in RePEc: Add citation now
- Largest connected set represents the largest sample in which all the associated firms are connected. The leave out sample is the largest connected set such that every firm remains connected after removing any given edge (mover), see Appendix B for details. We further pruned this sample by removing any stayer belonging to the firms associated with the mover with the highest lindeberg condition. A mover is defined as a worker who switched firm between the year 1999 and 2001. Statistics on log daily wages are person-â€Âyear weighted. Source: VWH dataset.
Paper not yet in RePEc: Add citation now
- Lei, L., P. J. Bickel, and N. E. Karoui (2016). Asymptotics for high dimensional regression mestimates: Fixed design results. arXiv preprint arXiv:1612.06358.
Paper not yet in RePEc: Add citation now
MacKinnon, J. G. and H. White (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of econometrics 29(3), 305–325.
- Mohar, B. (1989). Isoperimetric numbers of graphs. Journal of Combinatorial Theory, Series B 47(3), 274–291.
Paper not yet in RePEc: Add citation now
Moulton, B. R. (1986). Random group effects and the precision of regression estimates. Journal of econometrics 32(3), 385–397.
- Newey, W. K. and J. R. Robins (2018). Cross-fitting and fast remainder rates for semiparametric estimation. arXiv preprint arXiv:1801.09138.
Paper not yet in RePEc: Add citation now
Phillips, G. D. A. and C. Hale (1977). The bias of instrumental variable estimators of simultaneous equation systems. International Economic Review, 219–228.
- PJ `=1 λ̇−2 ` ≤ 4 ( √ Jλ̇J )2 since λ̇` ≤ 2 (Chung, 1997, Lemma 1.7). An algebraic definition of C is C = min X⊆{0,...,J}: P j∈X Ḋjj≤1 2 PJ j=0 Ḋjj − P j∈X P k/ ∈X S∆ ˙ f∆ ˙ f,jk P j∈X Ḋjj and it follows from the Cheeger inequality λ̇J ≥ 1 − p
Paper not yet in RePEc: Add citation now
Powell, J. L., J. H. Stock, and T. M. Stoker (1989). Semiparametric estimation of index coefficients.
- Proof. The following two conditions are the inputs to the proof of Theorem 2 in Andrews and Mikusheva (2016), from which it follows that lim inf n→∞ P θ ∈ Ĉθ q = lim inf n→∞ P min (ḃ 0 q,θq) 0 ∈B b̂q − ḃq θ̂q − θq !0 Σ̂−1 q b̂q − ḃq θ̂q − θq ! ≤ z2 κ̂ ! ≥ 1 − α where B = n (ḃ0 q, θq)0 : Pq `=1 λ`ḃ2 q,` + θq − θ = 0 o
Paper not yet in RePEc: Add citation now
- Quenouille, M. H. (1949). Approximate tests of correlation in time-series. Journal of the Royal Statistical Society. Series B (Methodological) 11(1), 68–84.
Paper not yet in RePEc: Add citation now
- Rao, C. R. (1970). Estimation of heteroscedastic variances in linear models. Journal of the American Statistical Association 65(329), 161–172.
Paper not yet in RePEc: Add citation now
- Raudenbush, S. and A. S. Bryk (1986). A hierarchical model for studying school effects. Sociology of education, 1–17.
Paper not yet in RePEc: Add citation now
- Rovigo Belluno Rovigo -†Belluno [1] [2] [3] Largest Connected Set Number of Observations 43,330 63,462 106,964 Number of Movers 5,061 7,921 13,022 Number of Firms 2,579 3,131 5,732 Mean Log Daily Wage 4.6089 4.7482 4.6917 Variance Log Daily Wage 0.1560 0.1256 0.1427 Leave Out Sample (Pruned) Number of Observations 32,848 56,044 89,666 Number of Movers 3,531 6,414 9,972 Number of Firms 1,282 1,684 2,974 Mean Log Daily Wage 4.6015 4.7636 4.7047 Variance Log Daily Wage 0.1674 0.1245 0.1465 Maximum Leverage ( ) 0.9241 0.9085 0.9236 Table 1: Comparing Samples and Places Note: Data in each column corresponds to person year observations in the years 1999 and 2001 belonging to a given province in Veneto, where the last column represents the union of the Rovigo and Belluno provinces.
Paper not yet in RePEc: Add citation now
Sacerdote, B. (2001). Peer effects with random assignment: Results for dartmouth roommates. The Quarterly journal of economics 116(2), 681–704.
- Sølvsten, M. (2017). Robust estimation with many instruments. Unpublished manuscript, University of Wisconsin - Madison.
Paper not yet in RePEc: Add citation now
- Scheffe, H. (1959). The analysis of variance. Technical report.
Paper not yet in RePEc: Add citation now
- Searle, S. R., G. Casella, and C. E. McCulloch (2009). Variance components, Volume 391. John Wiley & Sons.
Paper not yet in RePEc: Add citation now
Serafinelli, M. (2017). Good firms, worker flows and local productivity.
- Sherman, J. and W. J. Morrison (1950). Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics 21(1), 124–127.
Paper not yet in RePEc: Add citation now
- Silver, D. W. (2016). Essays on labor economics and health care.
Paper not yet in RePEc: Add citation now
- Song, J., D. J. Price, F. Guvenen, N. Bloom, and T. Von Wachter (2017). Firming up inequality. Technical report, National Bureau of Economic Research.
Paper not yet in RePEc: Add citation now
Sorkin, I. (2017). Ranking firms using revealed preference. Technical report, National Bureau of Economic Research.
- Spielman, D. A. and N. Srivastava (2011). Graph sparsification by effective resistances. SIAM Journal on Computing 40(6), 1913–1926.
Paper not yet in RePEc: Add citation now
Swamy, P. A. (1970). Efficient inference in a random coefficient regression model. Econometrica: Journal of the Econometric Society, 311–323.
- The JLL implies that we can -approximate all the statistical leverages in our bipartite graph by solving only a logarithmic number (p) of linear systems. Algorithm 2 below is taken from Spielman and Srivastava (2011) and illustrates how to approximate the statistical leverages associated with the model of Section (7.2). To implement the solution step listed in row 7, we take advantage of the “CMG†solver of Koutis et al. (2011) for symmetric diagonally dominant linear systems.
Paper not yet in RePEc: Add citation now
- To construct the person-year panel used in our analysis, we follow closely the sample selection procedures described in Card, Heining, and Kline (2013). First, we drop employment spells in which the worker’s age lies outside the range 20-60. The average worker in this sample has 1.21 jobs per year. To generate unique worker-firm assignments in each year, we restrict attention to spells associated with “dominant jobs†where the worker earned the most in each corresponding year. From this person-year file, we then exclude workers that (i) report a daily wage less than 5 real euros or have zero days worked (1.5% of remaining person-year observations) (ii) report a log daily wage change one year to the next that is greater than 1 in absolute value (6%) (iii) are employed in the public sector (10%) or (iv) have more than 10 jobs in any year or that have gender missing (0.1%).
Paper not yet in RePEc: Add citation now
- Verdier, V. (2016). Estimation and inference for linear models with two-way unobserved heterogeneity and sparsely matched data. Technical report, Mimeo.
Paper not yet in RePEc: Add citation now
- Whitford, J. (2001). The decline of a model? challenge and response in the italian industrial districts. Economy and society 30(1), 38–65.
Paper not yet in RePEc: Add citation now
- Woodbury, M. A. (1949). The stability of out-input matrices. Chicago, IL 9.
Paper not yet in RePEc: Add citation now
Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT press.
- Wright, S. (1921). Correlation and causation. Journal of agricultural research 20(7), 557–585.
Paper not yet in RePEc: Add citation now