In this paper, a method is developed to automatically search online for business websites. Subsequently, a statistical learning model is used to predict for each population unit a probability that the retrieved website belongs a particular business. Next, the most likely hit is selected. The method is applied to a random selection of business units with 10 or more employees.
Enterprise websites are a promising source of information for official business statistics. Therefore, we aim to know the linkage between business website addresses (URLs) and businesses. More specifically, we are interested in the link between domains, a domain is a part of the URL, and legal units. We started by linking domains from the chamber of commerce and from an external company to a set of legal units. Next, we developed a method to automatically search online for domains using identifying information of legal units, such as their name and address. This results in a set of candidate domains. We then select the domain with the highest probability. This probability is estimated by a trained statistical learning model.