EFIGE

EFIGE (European Firms in a Global Economy) was a joint project of eight European academic institutions, supported by the European Commission’s Directorate-General for Research through its 7th Framework Programme, and coordinated by Bruegel. The goal of the project was to better understand how European firms are affected by globalization and to draw some conclusions for policy. The project was running between September 2008 and August 2012, with survey data collection being carried out in early 2010 in seven European countries: Austria, France, Germany, Hungary, Italy, Spain, and the United Kingdom. Most of the information was collected as a cross-section for the last budget available (2008), while some questions cover the years from 2007 to 2009.

The dataset includes a wide range of qualitative and quantitative information. The focus is on international operations (exports, imports, outsourcing, FDI, etc.), while other topics include investment, R&D and innovation, proprietary structure, governance, composition of the workforce, finance, pricing behavior, and response to the crisis. The total number of variables is around 150.

The surveyed firms constitute a representative sample of the population manufacturing firms of their country with more than 10 employees, with proper stratification by industry, region, and size class. Large firms were oversampled due to their importance in the aggregate economy. The final sample has more than 2,200 firms for the UK, 3,000 firms for the other four major economies, and some 500 firms for the two smaller countries, Austria and Hungary.

Information on access to the EFIGE dataset can be found here. The publicly available version has been anonymised according to standard practices in the field. The full version can be matched to the Amadeus firm panel from Bureau van Dijk, which enables both validation and the estimation of firm-specific measures of productivity.

For a detailed description of the EFIGE data, see:

Altomonte, Carlo & Tommaso Aquilante (2012). The EU-EFIGE/Bruegel-Unicredit dataset. mimeo


fDi Markets

fDi Markets is an online database maintained by fDi Intelligence division of the Financial Times Ltd, which is tracking cross-border greenfield investments for all sectors and countries worldwide. The database is updated using information on time, location, industry and monetary value of the investment, and on the number of jobs created, provided through media announcements. Location of both parent company and local investing company participating in investment projects is identified on city-level precision. In addition, fDi provides some details on a purpose and market orientation of investments. The database is constructed in a way that any later expansion of an already existing investment is counted as a separate observation.

A particular interest of VSVK Group stands for the sample of the fDi Markets database containing information on firms being originated and investing within 28 countries-members of European Union, starting from 2003 (as for June 2015, number of observations amounted to about 22,900 investment projects). The EU-28 sample contains 8,894 different parent companies and 11,751 different local investing companies from 36 industry sectors. The sample also includes characteristics of FDI based on 269 sub-sectors and 17 clusters/activities.

For a more detailed description of the fDi Markets database and matching it to Amadeus, see:

Békés, Gábor & Márta Bisztray (2016). FDI Investors and their investments – firm level investigations with EU28 data: Data description with an example. Unpublished manuscript, IE CERS HAS.


Hungarian real estate transactions

Housing prices data by NTCA is a transactional data containing information on Hungarian housing market. Full dataset covers the entire territory of Hungary for the time period from 2000 to 2014. Depending on the location of real estate units, the dataset could be a useful source of information on transactions, including year of purchase, purchase price documented through transaction, living size and lot area, building type (detached house, terraced house, condominium or flat in a block of flats), location of the lot (based on zip codes). In addition, the dataset contains variables describing various characteristics of a property like number of rooms, the existence of balcony, year of construction etc.; however, practical use of these variables is limited due to the low filling ratio.

For the purpose of analysis being conducted by members of VSVK group, variety of filtering technics were applied to the dataset. The need for filtering was largely driven by inaccurate data recording that resulted in occasional presence of irrelevant price values and incomplete area data.

For a more detailed description of this dataset, see:

Békés, Gábor, Áron Horváth & Zoltán Sápi (2016). Housing prices – Hungarian micro data. Unpublished manuscript, IE CERS HAS.


Hungarian accessibility database

This database, compiled for VSVK by Terra-Laky Kft., provides information on road connections in Hungary. For every pair of Hungarian settlements and districts of Budapest (approximately 10.1 million pairs, or, alternatively, a 3,178 x 3,178 matrix), it includes the duration of the quickest route by car, its total length and its breakdown by road type, and an ordered list of the roads used. The dataset covers 1996, 2000, 2005, 2010, 2013, 2014, and 2015. An important trend in this period was a decrease in average travel times combined with a slight increase in the average length of quickest routes, mainly driven by the construction of highways and expressways.

The optimal routes are not derived from actual transport data, but calculated by an algorithm using data on the road network from the corresponding year. Comparison of routes from the original version of this database, from two online route planners, and from one for GPS devices was used to improve the quality of the data. The main advantage of the dataset compared to scraping similar data from some route planning interface is its panel structure, which enables the analysis of the impact of infrastructural investments on labor and real estate markets.

The accessibility data is supplemented by a list of the settlements with the coordinates used by the route planning algorithm to represent them (junction closest to the centroid or the arguably most important junction), and data on roads constructed between 1995 and 2015.

For a more detailed description of this dataset, see [in Hungarian]:

Békés, Gábor & Zoltán Sápi (2016). Települések közötti közúti elérési viszonyok 1996-2015. Unpublished manuscript, IE CERS HAS.