Identifying Nascent High-Growth Firms Using Machine Learning
Predicting which firms will grow quickly and why has been the subject of research studies for many decades. Firms that grow rapidly have the potential to usher in new innovations, products or processes (Kogan et al. 2017), become superstar firms (Haltiwanger et al. 2013) and impact the aggregate labour share (Autor et al. 2020; De Loecker et al. 2020). We explore the use of supervised machine learning techniques to identify a population of nascent high-growth firms using Canadian administrative firm-level data. We apply a suite of supervised machine learning algorithms (elastic net model, random forest and neural net) to determine whether a large set of variables on Canadian firm tax filing financial and employment data, state variables (e.g., industry, geography) and indicators of firm complexity (e.g., multiple industrial activities, foreign ownership) can predict which firms will be high-growth firms over the next three years. The results suggest that the machine learning classifiers can select a sub-population of nascent high-growth firms that includes the majority of actual high-growth firms plus a group of firms that shared similar attributes but failed to attain high-growth status.