The Software Engineering for Machine Learning Applications (SEMLA) initiative aims at bringing together leading software engineers, machine learning experts and practitioners to reflect on and discuss the challenges and implications of building complex data-intensive software systems. The event’s structure follows a working conference where panel, working groups and talks reflect on the many hurdles of developing, deploying and maintaining data-intensive software systems integrating machine learning elements. Our concerns as software engineers can be summarized in the following two important questions:

1) Do data scientists/ML or AI experts/domain experts possess the necessary skills to develop, maintain and extend the complex software implementations of analytical algorithms?

Defining and in vitro validating algorithms is vital, however their implementation and integration in real world applications are equally if not more important, since it’s the software that is going to be used and stressed directly, not the core algorithm. Coding is neither where software begins nor where it ends. Therefore, it is important that the task of implementing the algorithms into software is taken over by well-trained and well-equipped professionals in collaboration with the various experts.

2) Do software engineers have or need to have intimate knowledge of the domain or the algorithm for which they provide an implementation?

Exactly as software skills are important for ML applications, so is the understanding of ML and of the domain. Without it, the software engineer may not include the proper constraints, or develop the proper test cases, or eventually develop the software in an extendable and maintainable manner. While it would be fortunate to have professionals trained in both ML and SE (and possibly be knowledgeable about the application domain), this is rarely possible. Fortunately, software engineering has developed a plethora of techniques and processes to augment communication and collaboration within diverse teams and also between the development team and the domain experts/clients. Unfortunately, these processes are not always adopted by non-software professionals.

In trying to respond to these questions, we may find ourselves circling around two broad solutions, which can lead to different questions on their own:

1) How can we change and shape the education of computer scientists/engineers in order to excel in developing data-intensive software systems (and others of similar nature)?

Nowadays, there is the increasing realization that software engineering and computer engineering need to be multidisciplinary domains given their high degree of application. In this sense, such professionals need to receive broad training and education that, at the same time, is applied and aligns to the needs of the market.

2) What tools do we need to develop to provide support to all stakeholders of such a system (software engineers, ML/domain experts)?

Besides proper training and the formal processes, we also need the proper automated tools to support the development of the system and the communication between the various experts. This will result in efficient and effective development, but it will also increase the success of future extensions. The response to this question will spur new research to facilitate practice.

Specific goals of the symposium include:

  • Broadening awareness within the software engineering community of the potential for the application of machine learning algorithms;
  • Facilitating the exchange of ideas and interaction between international researchers in machine learning and software engineering;
  • Defining open and key research problems faced in realizing usable tools/approaches for intelligent system design, development, test, and evolution;
  • Constructing a foundation of materials for future research on SEMLA.