Threshold or probability modeling for TRED

by John Leatherman, Kansas State University

(Click here to return to the TRED Table of Contents.)

What is probability modeling?

As the name implies, probability modeling is a regional economic modeling technique that generates a prediction of the likelihood that a given industry sector establishment will appear in a particular location, over a period of time. In other words, it may reveal that the probability of growth of, say, grain mill products manufacturing in Smith County, KS between 2008 and 2016 is 27%. This kind of prediction typically is carried out for a larger number of economic sectors to create a “modeling system.”

Why is this helpful information to have?

It’s all about the relative rankings of probabilities across a range of economic sectors. If there is a 76% chance that a community is likely to see growth in household furniture manufacturing and only a 7% chance that it will see growth in farm and garden machinery manufacturing, this information can help establish local economic development priorities. Further, if looking across a range of economic activities, various classes of activities such as finance and business services are clustered near the top of the probabilities and others such as manufacturing are clustered generally at the bottom, this would suggest something about the efficacy of industry targeting strategies if the objective for local economic development was to increase local jobs and income.

How does the system work?

The system uses a technique called logistic regression analysis in a two-step procedure. The basic idea is to count the number of industry starts in a community (county) during a period such as 2000 to 2008 and to code industry sector increases as one and everything else as zero (i.e., if the number of establishments does not change or even decreases). We then use the community’s characteristics in 2000 to explain the growth pattern using this simple logistic regression technique.

The second step is to assume that the structural relationships between the community’s growth patterns and its characteristics remain constant, but that the attributes of the community will change over time such that it affects the probability of the growth of different industry sectors. To determine these probabilities, we take the regression coefficients from the 2000 model and pair them with the community characteristics as they exist in 2008 to generate a probability that growth in a given economic sector will occur over the next nine year period from 2008 to 2016. A separate model is estimated for each of the industry sectors included in the system.

You say the community’s characteristics influence these probabilities. What characteristics are you talking about?

There are many community characteristics that cumulatively affect the likelihood that a given type of economic activity will locate in a particular place. We start by thinking about broad classes of characteristics such as the community’s economic characteristics, its social attributes, the community’s infrastructure, and industry production input requirements and market access requirements. We then seek to characterize the community’s economic characteristics with specific measures of things like percentage manufacturing employment, percentage service employment, age and value of the housing stock, per capita taxes, etc. Similarly, we characterize industry input requirements such as the local labor force characteristics, e.g., population density, educational attainment, labor force participation rates, and average earnings per job. All told, several dozen or more community characteristics may be included.

Where has this type of modeling been done before?

One example is the Northeast Industrial Targeting Model and Economic Development Database System developed by Goode and Hastings (1989) in the mid-1980s. They matched industry requirements with community characteristics for 69 manufacturing sectors in 730 non-metropolitan communities in the northeastern United States. More recently, Leatherman, Howard and Kastens (2004; TRED chapter 7) developed the Plains Economic Targeting System for 78 industry sectors and 414 counties in six Great Plains states.

Tell me more specifically how to use the information generated by the model.

There can be multiple uses for such a modeling system that can help us to understand both community and regional prospects as well as inform our economic development policies.

For a given community (county), we can start by examining the discrete probability values for each industry sector included in the system. If we thought, for example, that our region was favorable for food processing and that was a primary focus for economic development efforts, we might observe the probability of a new grain mill establishment was 81%, preserved fruits and vegetables was 31%, meat products was 10%, and dairy products was 0%. This can help us to decide whether we are appropriately focused in our economic development strategy, or how we may need to refine our focus within the strategy.

Secondly, we can array all of the probabilities from highest to lowest to see which broader classes of activities are most likely to appear. Again, if our objective for economic development is to generate more local jobs and income and we observe that the highest probability activities tend to be finance and business services while the lowest probability activities and in the manufacturing sector, we can ask ourselves whether we are allocating our scarce time and resources appropriately.

We can go a step further by transforming the industry regression model coefficients into what are called “marginal impacts.” This shows us the discrete effect each of the variables in our model has on determining the probability calculation for an industry sector. We might see, for example, that for each 1% increase in the population with more than a high school education, the probability of attracting computer and data processing centers increases by 5%. This implies local economic development policies that can indirectly make our community more attractive to a desired target industry. Maybe our “smartest” economic development strategy is to focus on adult education rather than direct industrial recruitment.

We can use the marginal impact information to better understand the needs of both existing industries and targeted industries represented in the system. If we have two or three dozen explanatory variables in the model, we can see to which of those variables an industry sector is most sensitive. This insight can be just as valuable to local retention and expansion programs as it is to industry targeting.

Finally, we can take a step back to see broader regional prospects by examining industry probabilities for multiple counties or across an entire state. This will give us a visual sense of whether a regional location is more or less conducive to an activity or, perhaps, the extent to which spatial competition may already exist. By correlating the probabilities across industry sectors, we can see which activities tend to cluster together or to repel one another.

Further details are provided in the accompanying edited volume (Targeting Regional Economic Development, Chapter 7).


Goode, F.M. and S.E. Hastings (1989) “An Evaluation of the Predictive Ability of the Northeast Industrial Targeting (NIT) and Economic Development Database (EDD) System,” Unpublished, The Pennsylvania State University.

Leatherman, J.C., Howard, D.J. and T.L. Kastens (2004) “Improved Prospects of Rural Development: An Industrial Targeting System for the Great Plains,” Review of Agricultural Economics, 24(1): 59-77.