https://d226lax1qjow5r.cloudfront.net/blog/blogposts/boost-your-productivity-with-model-driven-engineering-part-3/boost-productivity_model-driven-engineering_p3.png

Sina MadaniJava Developer Advocate

Sina is a Java Developer Advocate at Vonage. He comes from an academic background and is generally curious about anything related to cars, computers, programming, technology and human nature. In his spare time, he can be found walking or playing competitive video games.

Boost Your Productivity With Model-Driven Engineering (Part 3)

Published on January 31, 2024

#java

#low-code

#sdks

Time to read: 13 minutes

Introduction

Welcome back to the finale of this series on model-driven engineering. In Part 2, I described how I used these technologies to help save a lot of time in adding support for new APIs to the Vonage Java SDK. In this article, I will highlight some key principles based on lessons learned so that you can make the most of the technologies and approach.

Know Your "Why"

The first thing to bear in mind before going any further is to make sure that this is a good fit for your use case. As the saying goes, "When all you have is a hammer, everything looks like a nail". It's natural for technologies to go through fads - this is known as the "Gartner hype cycle". Whilst certain technologies and development methodologies seem appetising, at the end of the day they are tools. Having a diverse toolbox is great because it means we can choose the right tool or approach for the job. What's right for one use case may not be for another. So, when is MDE the right choice and when is it not? Here are some questions to consider.

Is the domain easy to model?

Not all challenges fit neatly into an easily definable metamodel. Whilst, in many cases, it is possible to take an object-oriented view of a domain with predictable relationships between concepts in the domain, there may be times when the nature of these concepts and even how they relate to each other is too dynamic. To make the most of a model-driven approach, the domain's metamodel should be at least definable with a high degree of certainty. The concepts should map to classes with definable attributes and relations. If there are too many domain elements to define and the relations are too complex, then your metamodel will reflect this. In such cases, defining the metamodel and models will be too great of a task and may outweigh the benefits compared to implementing it directly.

How complex will the generator be?

If your domain can be neatly defined using a metamodel, you also need to consider how these concepts map to the code and business logic you want to generate. There is an argument to be made for both extremes: if the logic is dense (i.e. highly dependent on the model), then auto-generating reduces the chance of errors. On the other hand, it can make the generator harder to maintain and work with. It's also worth asking how many templates you'll need to develop and maintain, as well as the overall complexity of the coordination logic for invoking these templates, especially when compared to hand-written code.

How much manual coding will it save?

One of the most important, and perhaps also most straightforward, questions is the ratio of boilerplate to logic. A model-driven template-based code generation approach works great when the generated code is mostly boilerplate such that the output can be pasted into your codebase with minimal or even no modifications at all. However, if the generated code requires significant additional manual coding - especially if it's modifying the generated code as opposed to e.g. adding new methods - then the value of the generator diminishes. This is because the overhead of developing and maintaining the generator, as well as refactoring or modifying the generated code by hand, can be a burden for some developers compared to writing these sections of the code from scratch.

How frequently will you use it?

Even if a model-driven approach is feasible - i.e., your domain fits neatly into a metamodel, and code generation is beneficial, it may not be worth the effort if you only plan to use it once. Of course, this depends on other factors like how much code you're generating - if it's a large project, it may well be worth the investment, even as a one-off. In other cases, the "savings" may be modest but add up over time through frequent usage. Another way of thinking about this question is, "How many models will we have?" - chances are, the more models you create (and the more frequently you change the models), the more usage you will get out of the generator.

How easy is it to update your model(s)?

In the previous point, I mentioned how having more models or frequently changing models allows you to better leverage the benefits of using a code generation approach. However, if your models are constructed solely for this exercise and must be constantly updated to reflect changing business requirements or the environment, this can add friction. In my case, for example, any changes in the API spec require the model to be updated. The burden of updating the model is directly proportional to the extent of changes in the specification. If the models need to evolve rapidly, then that means any previously generated code is now invalid and needs to be re-generated. Consider this alongside the fact that some of the generated code may have been (or need to be) modified by hand, as discussed in a previous point.

Will the metamodel need to change frequently?

Since models are structured data, it is reasonable to expect them to change - sometimes even rapidly. This is mostly fine and relatively low friction if updates to the model are automated. What shouldn't change frequently, however, is the schema (metamodel). This is somewhat similar to the first point: if the concepts in your domain are changing frequently and difficult to nail down to a stable metamodel, then that makes it much more difficult to follow a model-driven approach. Ideally, the metamodel should mostly be revised during the prototyping stage and rarely once stable and "in production" (i.e. the output is usable). Adding fields to the metamodel is usually not a breaking change since they may be optional, but removing classes, relations, and attributes may (and usually will) break your existing models and workflow, meaning you have to start over in some cases.

Besides bug fixes, how frequently will the generator change?

During the initial development of your code generator templates and coordination logic, it is natural for there to be many bugs or omissions. Once these are mostly ironed out, and the generated output is as desired, the generator can be considered stable. However, even after this, there will inevitably be changes due to new business requirements, style guides, or code structure. Whatever the reason may be, it's worth considering that any changes to a stable generator risk introducing new bugs, which again need to be ironed out. Since these can usually only be detected when inspecting the output code, it would be ideal if the generator did not need to undergo frequent changes. This is because the generator is (or can be) harder to test than the actual generated code, so rapidly changing business requirements that need to be reflected in the templating logic introduce additional friction.

Is your team on board with this?

Last but not least, even if a model-driven approach seems like a perfect fit for your use case, it is unlikely to be successful if your manager and colleagues are unwilling to accommodate this workflow. After all, building the metamodel, generator, and models, as well as maintaining them, is an additional burden that adds complexity and requires people to understand the technologies used. Organisational factors play a role too - such as where the models come from, who and how they will be created or updated, who's responsible for the generator, what kind of process will be used to extract generated output and add it to the existing codebase etc. There are many logistical challenges to consider regarding this approach, not to mention compliance and auditing! If you are using this approach to help generate boilerplate but everyone who uses it promises to manually inspect and review the generated code before committing it to the main codebase, there should be relatively little friction. On the other hand, if it's an integral and mandatory part of the development process with automation, deeper thought and due diligence are required. In any case, the people using the models, generator, or the resulting output should be aware and on board with the approach.

Iterative Development

Suppose you've thought about all of the questions above and determined the approach is a good fit for your use case. How should you go about building it? Based on what I've presented so far, you may be under the impression that model-driven development is a "waterfall" process. Whilst it's true that the metamodel should be stable and is naturally the starting point, in practice there is more flexibility than the literature and tooling would have you believe.

In my case, I certainly did not start with the perfect metamodel, and even now, it still has substantial room for improvement. For example, one thing I found when creating models is that I have to create built-in Java types to be able to reference them, but these could be part of the metamodel. Many of the attributes in the model - particularly Boolean attributes like isRequest, isResponse, isHal, isQueryParams etc. came about once I realised they'd be needed by the generator. Additive changes are not breaking - you can usually extend your metamodel with new types and attributes so long as they're optional or have a sensible default value.

Model-driven development is still software development, so an iterative process allows you to get feedback faster rather than investing too heavily upfront when you have less information. Therefore in practice, I recommend that you develop the metamodel, generator, and a sample model in parallel so that you can see the workflow end-to-end and spot any missing parameters in your metamodel as well as errors in the generated text early on rather than after you've engineered what on paper seems to be "complete".

"One-Shot" or "Pure" MDE?

The use case I presented in Part 2 was very much a "one-shot" approach. That is, once the code has been generated, I no longer have use for the generator or model. For example, if the API specification is updated or I spot some bugs in the output code unless there are major issues or revisions, chances are I'm not going to update the model and regenerate the code because I've already refactored and evolved the generated code. After all, the output is what matters to me and what will be maintained. In that sense, I wasn't really practising MDE as it's "supposed" to be done. The generated code is a starting point to build on, not the finished product. Hence, some may argue it's not model-driven in spirit. That doesn't mean it's not useful, though.

In a traditional model-driven approach, the code would ideally be regenerated from the model every time the model is updated so that the codebase and model are always in sync. Think about it this way: when you write a program in a JVM language (Java, Kotlin, Groovy, Scala, etc.), what do you commit to your source control? What do you maintain? What do you care about? It's the source code. But that source code isn't what matters at runtime; it's the bytecode (i.e., the .class files). Why is it that we care greatly about our code when it all compiles down to Java bytecode anyway? Because the bytecode is deterministically derived from the source code. One could argue that the main asset in these languages is the compiler. Well, a compiler is essentially a transformer: your source code is the model (conforming to the language's metamodel), the compiler is the generator, and the bytecode is the output from the generator (I have previously written about this if you're curious). Similarly, the generated code in "pure MDE" should be treated the same way as compiled code: necessary but not something we need to look at or care much about. The generator templates are essentially the "source code" in MDE.

It's important to reflect on this alongside the questions I posed earlier because manually maintaining synchronisation between the model and source code is a source of friction and potential errors. If you adopt a more pragmatic approach where you want to get some initial code and don't plan on re-using the generator with the same model, then this is less of a concern. Moreover, suppose the generated code doesn't need much (or any) manual modification. In that case, you can update the model or generator when you spot any errors in the output code and regenerate it with minimal manual effort. On the other hand, if the output code needs to be modified substantially by hand after generation, it's probably not worth updating the model and regenerating because you have to reconcile the manual changes with the generated code again.

Garbage In, Garbage Out

That brings me to my next point. You only get out of this approach what you put into it. This is true for both the model(s) and generator. If you cut corners when building your model by leaving non-essential fields blank, the generated code can't have the desired output. In my example, I was sometimes lazy with the documentation and values used for testing as I figured I'd need to tweak them anyway. But when I did invest more effort, I found I needed to do fewer modifications. As previously mentioned, you can always add to or tweak your generator templates if you find additional functionality is needed or there are formatting errors. Taking the time to ensure your model is correct and as complete as possible relative to your needs will reduce errors and the need to rerun the generator. This is especially important in one-shot approaches or when the generated code is manually edited after the initial generation.

Pareto Principle applies

The 80/20 rule very much applies to model-driven code generation in my experience and re-enforces the case for iterative development. You can get roughly 80% of the benefits with 20% of the effort. If you start with simple templates where you're generating even just the basic skeleton of the classes you'll need, you've already written a good chunk of the code. You have the classes with the correct names in the right package, with the copyright header, typical imports, methods, field declarations, constructors, etc. You can then gradually factor out the more complex hand-written logic if you find it repetitive into the generator.

Always start with the low-hanging fruit first, and don't try to do too much at once. I found this to be especially the case for generating test code: writing a generator to just create the test classes and method stubs with the assertions saved a lot of time. I could then focus on hand-writing the essential testing logic without worrying about the boilerplate. This also adds structure to your workflow, so you know exactly what needs to be done without being overwhelmed by the need to deal with all the little things. You'd be surprised how much this alone can boost your motivation and productivity!

On the flip side, it's important to balance this with the practicalities of maintaining and evolving the codebase. If you care mostly about maintaining the generated code rather than the generator and models, don't over-engineer your generator because you'll eventually find yourself making it exponentially more complex for very marginal benefits. This then makes the generator more intimidating and difficult to work with, so onboarding others becomes a challenge. As a rule of thumb, try to make each generator template look similar to the generated output. Suppose your templates consist more of dynamic sections with heavy branching logic as opposed to static text. In that case, it's probably worth splitting out that logic into utility methods or simplifying the template to make it more readable.

In any case, it's worth bearing in mind that there is no "magic" here: creating and maintaining the MDE workflow along with its tooling is an additional overhead. Be aware of how much time you're investing into it relative to the savings you're making. And that brings us full circle into justifying the rationale for this in the first place.

Conclusion

With that, we're at the end of this blog mini-series on pragmatic model-driven engineering. I hope you have learnt not only some new tools and a development approach but also are aware of its limitations and how to make the most of these technologies. Ultimately, I hope that this introduction to model-driven engineering and case study serves as a useful example and helps you make better decisions in your software projects, should you choose to go down this route.

If you have any comments or suggestions, feel free to reach out to us on X, formerly known as Twitter or drop by our Community Slack. I hope this article has been useful and I welcome any thoughts/opinions. If you enjoyed it, please check out my other Java articles.