BigSurv18: Big Data Meets Survey Science Conference and Supercomputing Center in Barcelona

BigSurv18: Big Data Meets Survey Science Conference and Supercomputing Center in Barcelona

Conference

Recently, I was representing 7N on BigSurv18 Conference in Barcelona . The conference title was Big Data Meets Survey Science and this was the first conference with the aim of exploring new statistical frontiers at the intersection of Survey Science and Big Data. The conference was organized by The European Survey Research Association (ESRA) and was held at the Research and Expertise Center of Survey Methodology at the Universitat Pompeu Fabra in Barcelona.

The participants represented major academic centers, global companies, and governmental agencies, for example: Colombia University, University of Chicago, University of Maryland, University of Michigan, University of Mannheim, University of Essex, NYU, Australian National University, University of Illinois, University of Southampton, RTI International, Kantar Public, GESSIS, CentERdata/Tilburg University, Max Planck Institute, SurveyMonkey, Qualtrics, Uber, Gallup, Gfk, Ipssos, Google, YouTube, Facebook, Civis Analytics, U.S. Census Bureau, Statistics Canada, Office for National Statistics UK, Statistics Netherlands, Statistics Finland, Statistics Norway, EUSTAT, OECD, and others.

The conference gathered many participants and presenters who are well-known in the worlds of Survey Science and/or Big Data. In attendance were, for example: Julia Lane (Professor at New York University, the recipient of more than 70 million USD in grants) whose current research focuses on Big Data’s role in government and public policies. Additionally, there was strong representation of R users - one of the presenters was Maciej Beręsewicz, who was the main organizer of first eRum: European R User Conference in Poznań, Poland.

Conference Topics

Nowadays, there are more and more barriers for conducting survey research projects. The major ones are decreasing response rates and increasing costs of gathering declarative data from respondents. Therefore, many survey methodologists are looking towards Data Science and Big Data trying to supplement or even replace traditional surveys with new methods and technologies and still gain useful insights. There are many challenges to overcome (like methodological issues, low data quality, privacy concerns, etc.), yet some approaches seem to be particularly promising. Therefore, some trending topics at the conference were:

  • ethical and privacy challenges and considerations,
  • using Big Data in Official Statistics,
  • using new technologies for passive data collection,
  • using data from Social Media (text-mining, process-mining, targeting recruitment),
  • merging data from different sources (survey data, administrative data, big data, passive data, web-scraped data),
  • using Big Data to improve surveys and vice versa.

Interestingly, there is no consensus about how to approach ethical and privacy challenges in Survey Data & Big Data world. For many, the EU General Data Protection Regulation (GDPR) is far from being the optimal solution. Some claim that the responsibility for protecting privacy should be moved from data-producers towards those who are using the data for different purposes. Nevertheless, many speakers seemed to agree that the problem is not a binary one. It is more a continuum between full privacy on the one side and full utility from data on the other. However, it was highlighted that the problem is far more complex, because what brings full utility for most people, can be harmful for a few, and even those few should be protected in some way. To sum it up, one is certain - much more discussion is still needed on this topic in many countries, both in US and Europe, and among various professions (including lawyers, politicians, social, computer, and data scientists, philosophers, and people who are concerned about their privacy).

New Paradigms in On-line Declarative Data Collection

At the conference I had a talk about new paradigms in on-line declarative data collection. I was arguing that off-line research paradigms are still dominant, even in the world of on-line research techniques, because most on-line surveys are mainly off-line questionnaires converted into more or less advanced on-line HTML forms. We need to seek, and to be willing to accept, new paradigms in social science research techniques that will allow us to develop new, native on-line research tools designed to build long-term relationships with our respondents. Although, even now almost all attention is focused on more traditional approaches to current problems in Survey Science, in my opinion the change already has started. These days, Social Scientists, Data Scientists, and Computer Scientists more often than ever are working together to find new, interdisciplinary, and innovative methods and tools for conducting important research projects with aim to benefit whole societies.

Barcelona Supercomputing Center

Aside the main conference, visiting „the most beautiful data center in the World” - Barcelona Supercomputing Center - was one of the main attractions of the conference. Created in 2005, the center has now one of the most powerful supercomputers in Europe based on MareNostrum 4 (described by experts as „the world’s most diverse supercomputer” due to the heterogeneity of its architecture). MareNostrum is known for allocating 32 million core hours to the Nobel Prize for detecting gravitational waves. Currently, MareNostrum can perform at 13.7 petaflops (13700 trillion floating-point operations per second). Every four years the supercomputers are upgraded to achieve the European Commission goal to develop exascale supercomputers. The Center has more than 500 employees from 45 countries. The total budget for 2017 was almost 37 million EUR. Supercomputing enables scientific experiments that could not be performed in the real world, because they would be too expensive, too dangerous or simply impossible.

Related