4th International Workshop on Performance Analysis of Big data Systems (PABS)

April 9-13, 2018
Berlin, Germany

In conjunction with the 9th International Conference on Performance Engineering (ICPE 2018)


We are seeing exponential growth in data generated from various platforms like social media, multimedia, enterprises, internet of things etc. It is becoming increasingly difficult to manage, analyze, visualize, model, store, search big data systems. However, we also witness growth in the complexity, diversity, number of deployments and capabilities of big data processing systems such as Map-Reduce, Spark, HBase, Hive, Cassandra, Big Table, Pregel and Mongo DB. The big data system may use new operating system designs, advanced data processing algorithms, parallelization of application, high performance computing architectures such as GPUs etc. and clusters to improve the performance. Traditional systems are also upgrading themselves to co-locate with popular big data technologies.

The workshop on performance analysis of big data systems (PABS) aims at providing a platform for scientific researchers, academicians and practitioners to discuss techniques, models, benchmarks, tools, case studies and experiences while dealing with performance issues in traditional and big data systems. The primary objective is to discuss performance bottlenecks and improvements during big data analysis using different paradigms, architectures and big data technologies. We propose to use this platform as an opportunity to discuss systems, architectures, tools, and optimization algorithms that are parallel in nature and hence make use of advancements to improve the system performance. This workshop shall focus on the performance challenges imposed by big data systems and on the different state-of-the-art solutions proposed to overcome these challenges. The accepted papers shall be published in ACM proceedings and digital library.


All novel performance analysis or prediction techniques, benchmarks, architectures, models and tools for data-intensive computing system for optimizing application performance on cutting-edge high performance solutions are of interest to the workshop. Examples of topics include but not limited to:

  • Performance analysis and optimization of Big data systems and technologies
  • Big data analytics using machine learning
  • In-memory analysis of big data
  • Performance Assured migration of traditional systems to Big data platforms
  • Deployment of Big Data technology/application on High performance computing architectures.
  • Case studies/ Benchmarks to optimize/evaluate performance of Big data applications/systems and Big data workload characterizations.
  • Tools or models to identify performance bottlenecks and /or predict performance metrics in Big data
  • Performance analysis while querying, visualization and processing of large network datasets on clusters of multicore, many core processors, and accelerators.
  • Performance issues in heterogeneous computing for Big data architectures.
  • Analysis of Big data applications in science, engineering, finance, business, healthcare and telecommunication etc.
  • Data structure and algorithms for performance optimizations in Big data systems.
  • Data intensive computing
  • Tools for big data analytics and management


  • Submissions due: January 15, 2018 January 19, 2018
  • Notification of acceptance: February 12, 2018
  • Camera-ready copies due: February 18, 2018
  • Workshop Date: April 09, 2018


9:00am - 9:05am
Welcome by Chairs

9:05am - 10:00am
Keynote - Challenges in Benchmarking Big Data an Machine Learning Systems
Tailmann Rabl

10:00am - 10:30am
Paper 1 - Investigation of Replication Factor for Performance Enhancement in Hadoop Distributed File System
Hilmi Egemen Ciritoglu, Leandro De Almeida, Eduardo Cunha De Almeida, Teodora Sandra Buda, John Murphy and Christina Thorpe

10:30am - 11:00am
Coffee break

11:00am - 12:00pm
Invited Talk 1 - Model-based Performance Evaluation of Batch and Stream Applications for Big Data
Johannes Kross

12:00pm - 12:30pm
Paper 2 - Exploratory Analysis of Spark Structured Streaming
Todor Ivanov and Jason Taaffe

12:30pm - 2:00pm
Lunch break

2:00pm - 3:00pm
Presentation 1 - Factors Affecting Machine Learning Algorithms Selection
Boris Zibitsker and Dominique Heger

3:00pm - 3:30pm
Coffee break

3:30pm -4:00pm
Invited Talk 2 - Benchmarking Distributed Stream Data Processing Systems
Jeyhun Karimov

4.00pm - 4:30pm
Closing - PABS Summary and Discussions
Rekha Singhal and Dheeraj Chahal


[ To be announced ]


Submissions describing original, unpublished recent results related to the workshop theme, upto 6 pages in standard ACM format can be submitted through the easychair conference system, following this link: EasyChair

In case of any difficulty please contact d dot chahal at tcs dot com or rekha dot singhal at tcs dot com . All Submissions must be in pdf format. Accepted technical papers will be included in the ACM Digital Library


  • Dheeraj Chahal, Low Latency Analytics, TCS Research, India.
  • Rekha Singhal, Low Latency Analytics, TCS Research, India.


  • Amy Apon, Clemson University, USA
  • Arindam Pal, TCS Research, India
  • Evgenia Smirni, College of Willian and Mary, USA
  • Giuliano Casale, Imperial College London, UK
  • Nicolas Poggi, BSC, Amsterdam
  • Saumil Merchant, Shell, India
  • Tilmann Rabl, DIMA, Toronto, Canada
  • Todor Ivanov, Goethe University, Germany