The Digital Layer: How Innovative Firms Relate on the Web
13 Pages Posted: 4 Feb 2020 Last revised: 7 Feb 2020
Date Written: 2020
In this paper, we introduce the concept of a Digital Layer to empirically investigate inter-firm relations at any geographical scale of analysis. The Digital Layer is created from large-scale, structured web scraping of firm websites, their textual content and the hyperlinks among them. Using text-based machine learning models, we show that this Digital Layer can be used to derive meaningful characteristics for the over seven million firm-to-firm relations, which we analyze in this case study of 500,000 firms based in Germany. Among others, we explore three dimensions of relational proximity:
(1) Cognitive proximity is measured by the similarity between firms’ website texts.
(2) Organizational proximity is measured by classifying the nature of the firms’ relationships (business vs. non-business) using a text-based machine learning classification model.
(3) Geographical proximity is calculated using the exact geographic location of the firms.
Finally, we use these variables to explore the differences between innovative and non-innovative firms with regard to their location and relations within the Digital Layer. The firm-level innovation indicators in this study come from traditional sources (survey and patent data) and from a novel deep learning-based approach that harnesses firm website texts. We find that, after controlling for a range of firm-level characteristics, innovative firms compared to non-innovative firms maintain more numerous relationships and that their partners are more innovative than partners of non-innovative firms. Innovative firms are located in dense areas and still maintain relationships that are geographically farther away. Their partners share a common knowledge base and their relationships are business-focused. We conclude that the Digital Layer is a suitable and highly cost-efficient method to conduct large-scale analyses of firm networks that are not constrained to specific sectors, regions, or a particular geographical level of analysis. As such, our approach complements other relational datasets like patents or survey data nicely.
Keywords: Web Mining, Innovation, Proximity, Network, Natural Language Processing
JEL Classification: O30, R10, C80
Suggested Citation: Suggested Citation