yakov.codes

Focus on Data

Data-oriented approaches in modern software engineering are widely used and have different implementations, meanings, and uses. Here are discussed some of them with their values and applications.

Contents


Commonly confused paradigms #

Two paradigms unfortunately share a similar name, and it is important to differentiate them. Everything in this article is related to the first one, as most of the things discussed can be categorized as a part of it. Not to paint the second one insignificant thought – it is just a very different concept that solves very different problems.

Data-driven programming #

A quote from the Wiki:

In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken.

A broad definition, I know. It can be fit onto simple map-reduce list transformations that describe the logic of the application, as well as on complex asynchronous Stream transformations, and onto something akin to dynamic programming's attribute set manipulations, like in Nix.

The main point is, that the data itself describes both the control flow and the logic, where the whole application becomes a collection of high-level and abstract data transformation. The transformations themselves are generic and the program becomes a set of rules that command those transformations.

The beautiful The Art of Unix Programming dives deeper with examples on, surprisingly, Unix systems.

Data-oriented design #

A completely different approach from data-driven programming, focusing on optimizing data structures by grouping related fields into a multi-array, composite object, making the object of interest implicit.

So, instead of having

typedef User = (
  String name,
  int age,
);

typedef Users = List<User>;

The list of users is represented as

typedef Users = (
  List<String> name,
  List<int> age,
);

Representing effects as data #

One example of a data-focused approach can be representing impure logic as pure data that gets evaluated later. Both FP and OOP have some sort of it that differs slightly, maintaining the main idea.

Data as a description of an Effect #

In OOP and closer-to-OOP-than-others languages it is called Interpreter Pattern, somewhere in the middle it can be viewed as a simple effectful function over pure data types, and in FP it is a core of it, as the written code must be pure and every effect must be represented as some kind of data.

Below is an example of a pure function with async effects and logging in TypeScript.

const run = () async* => {
  yield {type: 'log', data: 'Starting'}
  yield {type: 'value', data: 10}
  yield {type: 'log', data: 'Waiting for 1 second'}
  yield {type: 'wait', data: 1}
  yield {type: 'log', data: 'Done!'}
}

The calling site would process those values and evaluate them, while the source function just returns plain values.

Sandwiches! #

There is a whole architectural approach that focuses on this kind of design for non-strictly-functional languages, which might sound limiting, but all "strict" approaches have their benefits, as well as being used widely, either by different names or more explicitly/as a part of a bigger combination of guidelines.

An impure-pure-impure sandwich.

Dataflow and Reactive programming #

Yet another data-centric approach, and arguably the most straightforward one in terms of the closeness of the problem to the nature of the solution, is a set of reactive-like paradigms.

Both deal with transformations of data as it flows through the system, and Reactive programming can be considered a subset of Dataflow, where both systems are concerned with the propagation of data through a set of transformers with the former operating on the data itself, and the latter operating on data changes.

Reactive programming can be further classified into synchronous/asynchronous and push/pull. Even further, even closely classified system implementation can differ greatly, and something like Effector is very, very different from ReactiveX, even though they share a lot in common.

Dataflow's definition is much more vague, but other than operating on the data itself it is more concerned with graphs, and TensorFlow builds upon this very notion. This article explains it much more in-depth.

On FRP #

One very specific implementation of reactive programming is Functional Reactive Programming, initially proposed in the paper Functional Reactive Animation. It is not widely used due to its implementation details, but it is very influential nonetheless, and the aforementioned ReactiveX can be thought of as something that resulted from FRP.

type Behavior a = Time → a
type Event a = [(Time, a)]

As usual, there are a lot of flavors. I can pinpoint Reactive Banana as a typical, yet elegant implementation that can be used as an example for understanding the paradigm. Their unofficial tutorial is also nice.

UI as a function of State #

In current component-based UI systems, the configuration of the UI itself is represented as a function of a state, which can be categorized as just another function over data that accepts immutable inputs and outputs a configuration, a description of the resulting UI, and the behavior of application-level code.

This approach was popularized by React, but currently, it is used pretty much universally – Flutter uses it, SwiftUI uses it, Kotlin Compose uses it, and so on.

I really like Flutters definition of this approach, which makes the idea a bit more radical, representing everything as a Widget. This means that data, the Widgets, describes every aspect of application-level behavior, from lifecycle events, such as showing a snack bar on opening a specific page, to the configuration of a resulting UI, such as the color of a button.

On hooks #

Funnily enough, React gave birth to this UI = f(State) paradigm, and React abandoned it by introducing hooks into the equation.

Lots of disagreement with the concept of Hooks exist, but one thing is a fact – hooks break the paradigm by making Components/Widgets impure. By passing the same props to the Component that uses hooks we won't necessarily obtain the same output, as they introduce state bound to the circumstances of the caller.

Flutter solves this problem by explicitly passing to each and every widget BuildContext – a wrapper object that contains the current state of a context that this Widget exists in. Given that the context is used with InheritedWidgets, it is possible to achieve stateful behavior without making a widget impure.

Composition vs Inheritance #

Prioritizing data in terms of creating collections of grouped functionality also has its applications. Composition over Inheritance is a well-used approach that is considered a "best practice" in a lot of cases, and pretty much as data-centric as it gets in a very data-opposed (and behavior-oriented) setting of combining functionality and achieving polymorphism.

Returning to Flutter as an example, it favors very strongly composition over inheritance, with a vast majority of widgets achieving only a single level of inheritance from their abstract parent class. The widgets are described as a composed, immutable data structure as well, with a polymorphic interface of the Widget class.

This demonstrates not only the extent to which the compositional approach can be pushed but also the fact that it is indeed handy to combine approaches, as stated in the Design Patterns.

Multiple inheritance #

A very short tangent regarding Multiple Inheritance – CoI is an example of the simplest solution for a problem that is traditionally solved by using different "flavors" of inheritance instead.

Take a look at Dart's Mixins and Swift's Protocols and see how composition can be used to solve the very same problem, without stepping away from data as a mean for implementation of functionality.