|4. The full list of shapes under a canonical representation|
|5. Coupling this world with time|
|6. Temporal morphogenesis at work: simple examples, visualizations, and some rules|
|7. Towards partitions of the morphospace: patterns and indications for future research|
|8. The meso-level: on the crucial operations of aggregation and disaggregation of shapes in shapes|
|9. Attribution of a unit to a class|
|10. Relaxing the initial simplifying assumptions|
|11. Final generalised definitions|
|12. Basic guidelines for empirical use of temporal morphogenesis|
|13. Basic guidelines of temporal morphogenesis for interpreting the assumptions and the results of models|
|14. A few examples of application|
|14.1. Income distribution dynamics|
|14.2. Industrial dynamics of market shares|
|14.3. Macroeconomic business cycles|
|Appendix 1 - List of morphospaces released|
|Appendix 2 - Code|
|Appendix 3 - A nested approach to time|
|Appendix 4 - On the average of meso-shapes belonging to the same morphospace of the macro-shape|
|Appendix 5 - The periodic table of shapes|
|Appendix 6 - A stochastic interpretation of morphospaces|
|Appendix 7 - Small sample statistics: the role of morphospaces and patterns|
This paper provides a method to enlist all possible states-of-the-world and changes in a micro-meso-macro system as well as all possible shifts in the emerging distribution of entities. Although the analysis is conducted at a fairly abstract level, bordering with mathematics, this conceptual pathway includes a host of potential applications for the kind of pluralistic perspective that we are developing for economics.
We provide a well-defined context in which the problem is sensible and meaningful. A certain number n of entities, that will be called "units", have certain features, that will be called "classes" whose numerosity we shall denominate as s. We intend to enlist all possible vectors (ordered collection of units referring to all classes) and to study the way in which one shifts to another, due to abstract processes at unit level (microdynamics). Conversely, the analysis will lead to identify how a certain observed change at the macrolevel can shed light on what is happening at micro and meso levels. In order to meaningfully speak about dynamics, we shall couple the list with time as elaborated here - thus in particular by separating a logical time from a chronometric one.
We highlight the relevance of generating full lists of ordered row vectors with certain features (in terms of n. of columns, sum of the all items of the vector, each assumed as integer, and further additional restrictions) and we handle you a computational method to generate them, while hinting to several venues of manipulation and utilization of the results.
We call our approach as "temporal morphogenesis" because we aim to enlist all possible shapes (morpho, better: μορφώ in Greek, a way in which the Goddess Aphrodite was also indicated) and the way in which one of them transitions to another over time (thus singling out the latters' origin or genesis γένέσις). The name is also an echo and tribute to René Thom and its "theory of catastrophe". Since the Greek word morpho is not common in English, we equivalently call "shape" what is generated during the morphogenesis.
Entities can be e.g. individuals in empirical surveys, companies in market studies, agents in agent-based models. Respectively, classes can be answers to questions, industry and geographical classifications, values in models' parametrization and in their resulting behaviours.
This approach provides precise formulations for "emergent properties", a central but often elusive feature of non linear and evolutionary models, including agent-based models. It does so in connection with real empirical data, thus providing an interesting bridge between the two.
The field includes the possibility that a unit may lay between two classes, that classes have fuzzy borders, that sets of classes refer to different aspects, which is the case for the following units:
This image presents a set of different geometrical shapes of different size and colours, so that classes (square, circle), classes (small, medium, large), classes (orange, blue, green, yellow) are each well defined with 2, 3, 4 elements in each but taken together they refer to three different categories (geometrical shape, size, colour), with the corresponding state-of-the world (6, 4); (2, 7, 1); (1, 7, 1, 1).
But the discussion of such cases is deferred after a simpler, more basic case, in which n and s are mono-dimensional integers. Changes in the number n (which can be due to the birth and death of units, to splitting and uniting as well as to the entry and exit from other systems) are allowed, as well as changes in the number s (which can be due to a technological innovation leading to the creation of new possible features or to a more or less painful sectoral euthanasia). But again, this discussion is postponed after a core in which s and n are fixed.
To summarize, the order of this paper is the following:
1. initial definition of the simplified world of which we are talking about;
2. the definition of change in such a world;
3. numerical examples with low numbers of units and classes;
4. a core method for enlisting all possible shapes in the macrolevel for any given n number of units distributed in any s number of classes;
4. the full list of shapes (also called "morphospace") under a canonical representation in which n = 50 and s = 5;
5. explicit coupling of this world with logical and chronometric time;
6. putting temporal morphogenesis at work, by a few simple examples, visualizations and some initial reflection on the kind of processes that lead from one shape to another;
7. introducing possible partitions (taxonomies) across shapes and a method to validate them;
8. introducing the crucial meso-level with operations from shapes to shapes;
9. relaxation of several simplifying assumption regarding units and classes, with final generalised definitions of the world and its temporality;
10. outline of a few examples of application of the theoretical construct to key issues in economics, namely income distribution dynamics, industrial dynamics of market shares, macroeconomic business cycles, while deferring to a future discussion its application in political sciences, namely referendum and election results, opinion polls, surveys.
Across all the text, we shall provide you with practical tools to operate your own analyses, opening up suggested lines of action. We are far from exhausting the field with such a seminal paper!
We added several appendix, to put order on certain issue or provide new insights, including an important proposal of a periodic table of shapes and a probabilistic approach that can shed light to small sample statistics. The main text can be read without such additional elements, which, correspondingly, have an autonomous value.
Our core method (4.) is related to a certain branch of mathematics, namely combinatorics, in whose language what we are talking about is usually referred to as a "weak composition of the integer n into s parts". Such language, however, obscures the goals that we have, reason why we have chosen the previously given conventions. Still, we do leverage some specific results in the computational part of combinatorics so as to provide a core component of our line of reasoning. In particular we distribute the actual Java code for the algorithm created by (and documented in) Page (2012). Thanks to it you will be able to generate the most appropriate morphospaces for the problem you have, by introducitng a few explicit restrictions that allows to cope with the task at hand.
When we use the name "vector" (or "row vector") to addess shapes, we mean the word according to linear algebra language. Moreover, in our line of reasoning there exist connections to what in "dynamical systems" are called "semi-cascades". However, we reject the Newtonian conceptualization of time, the assumed structural similarity between space and time, and our developments are original.
We intend to contribute to evolutionary economics, in which the learned reader would find elaboration somehow near to ours. We share a certain cultural background with the dynamics of the systems (Von Bertalanffy). Moreover, there are obvious similarities with the mathematics of sets and groups. But we are not particularly interested in drawing languange and methods from the latter. The reader at the end will be able to judge whether this choice has been fruitful.
One of the many advantages of our method
Before we proceed, we would like to point to a decisive benefit of adopting a temporal morphogenesis approach: by generating a morphospace that is exhaustive of all possible shapes, you get a structured container to which both real (empirical) and artificial (model-generated) data structures can be mapped. Thus you will have precise indications about what distinguish real from non-real shapes (ie. shapes that are possible but not empirically detected), as well as how extensively models can replicate real structures. Comparison of models can be based on how many (and which) real structures can be generated by them.
Moreover, discovery of new structures become a possibility for models, substantiating their power. Scientific knowledge gets a venue for cumulativeness, in the sense that new models may be directed to generating structures not yet generated by current models but that exist or that might exist (i.e. are in the morphospace but not in the subset of the empirically detected).
The world we are talking about has a certain number n of entities, counted as units. It has a certain number s of features, counted as classes. Each unit belongs to one and only one class, because it is characterised by such a feature. In every class there is a certain number of units, from zero to n. The total number of units in all classes is n, which is the sum of all number of units across all classes.
In this world, change means that a unit changes its class: instead of beloging to one, it belongs to another. By enlisting all possible states, one covers all possible changes (from one state to another). Several units can change their class at the same time.
For terminological consistency, we shall call "changes" the movements of the units over different classes, "shifts" the movements of the shapes, and "modifications" all other possible movements. Morphospace is the exhaustive list of all shapes sharing a certain feature.
If n = 1 and s = 1, which we shall be denoting shapes(1,1), the only possible state is (1). No change is possible.
If n = 8 and s = 1, thus the morphospace can be called shapes(8,1), again there is only one possible state, namely (8). No change is possible. The necessary condition for change to happen is that there are at least two classes to which a unit can belong.
If n = 1 and s = 2, thus shapes(1,2) there will be two different possible states: (1 0) and (0 1). Change can be from (1 0) to (0 1) and from (0 1) to (1 0). Accordingly, there are two possible changes.
Shapes(1, 3) allows for three possible states (1 0 0), (0 1 0) and (0 0 1). Accordingly, there are 6 different changes:
As you can see, change proceeds by one step over time. Once arrived in the new position, a further change is possible. A sequence of several changes which can be mapped by re-applying the same table (what was in the second column is now sought for in the first column). One can have a deterministic or a stochastic process (the latter with equal or unequal, stable or changing probabilities in the shift to another shape). What's important is that all possible changes are contained in the table and that, after a certain number of steps in change, you can get a clearer idea of the type of shifts that shape can undergo. Tentative terminology can be established, with formal and operational definitions, subject to subsequent testing. Oscillations between two shapes, a final shape that terminates all processes (a kind of attractor), a never-ending change: these are just three possible characterisation of (groups of) sequences.
In what follows we use not the mathematical notation but rather computer programming conventions for operations as follows:
* means multiplication, thus 3 * 4 = 12
/ means division, thus 12 / 4 = 3
^ means "to the power of", thus 3^2 = 9. However, we continue to use the ! sign for factorial.
The number of changes in shapes(1, s) is s*(s-1), sometimes abbreviated in mathematics with s! In the previous example, 3 * (3-1) = 6 = 3!
But we do not want to limit ourselves to counting the changes: we want to enlist them, so to be able to show them, and their sequence, on a timescale while opening up a venue for an analysis of abstract processes that can lead from one shape to another.
Shapes(2, 3) allows for the following 7 states:
The following 7! changes are possible (42):
Shapes(3,3) allows for 10 states and 90 changes. Shapes(5,5) comprehends 125 states and 15500 changes.
Three comments can already be done: first, the problem seems very easy at the very beginning, but soon it becomes explosively large and not clearly anticipated in some elegant formula. This elegant formula, if it exists, can grasp how many states (nstates) and changes there are (in particular the number of changes is simply nstates!), but would not enlist them; it would not give the full list of states (and even less of changes). For the morphospace (the full list of shapes) you need programming code, which we are going to give you in the next chapter.
The method is far from trivial: it leverages a code offered as recently as in 2012 by Daniel R. Page and whose precious personal cooperation we gratefully acknowledge. In that text, devoted to restricted weak composition generation, it is stated that the cardinality of a solution for a generation problem is the same as the solution for the counting problem in this context. You don't know how many shapes you will generate until you actually generate them (or, better, that the latter takes the same number of steps, in a computational sense). In that paper, formal definitions are provided and proof given of a theorem about the correctness of the algorithm, which is provided and documented in appendix.
In the terms of Page (2012), we are generating the weak composition of the integer n into s parts. Strong composition requires the addend to be strictly positive; instead we choose the weak composition because the number of addends must always be equal to s, with one or more zero in the corresponding position. For instance in the list of all possible shapes(1,2) we cannot include the result (1), instead (1,0) and (0,1) need to be included in the results. We need to restrict the weak composition to the situation in which all columns (addends) are in the same numerosity.
In turn, the algorithm allows for very versatile constraints on the composition of each class (not only the standard case of zero to n).
Here you have the distributable implemention in actual Java coding that you can download from here, keeping into account these technical requirements and notes. You do not need to know specifically the Java language, although some acquaintance to computer programming would help.
You can utilize the Java code to generate the full list of states and changes for any given n and s. The results will be given to you in a csv file, which can be read with any editor such as MS Excel.
Since computation takes time but can be done once and for all for a given morphospace shapes(s, n), we select for distribution as MS Excel file the application of the core method to shapes(50,5), which gives 316 251 states and 100 014 376 750 (100 billions and something) possible changes. We chose this as "canonical representation" since any number of n can be normalised in percentage on a total and this is routinely made in statistical analysis of population (of people, households, companies, etc), especially in economics. In what follows we shall be alternatively interpret the number n of as actual units (e.g. here), as the total corresponding to 100% with quantized percentage, as total probability corresponding to 1 with quantized probability (e.g. here).
One might have preferred to use shape(100,5) but for computational current reason, we need to limit ourselves to n=50, which in turn means that each unit represent 2% of the total. If you have empirical data to fit in, then you'll take the nearest integer of 2 (0, 2, 4,...,100), with a maximum theoretical approximation of 1%.
On the other side, the number of classes (namely 5) is simply a good number, in the sense proposed by our pluralistic approach. If you need more classes, we put at your disposal several other morphospaces in Appendix 1. Conversely, you can run the core code for any n and s, with response time and memory storage need depending on the numbers you choose.
The morphospace can be reordered in many ways, but we set the order of shapes along the following rule as the standard for calling the shape according to a unique identifier (IDShape): order the shapes in descending order of the numerosity of class 1, then of 2, then of class 3, then of class 4, then of class 5 (for the canonical representation), the of class 6, etc. (for morphospace of larger number of classes).
For further morphospaces (lists of shapes), see below in the text and in the Appendix 1.
Since we are interesting in mapping change, by describing the full space of all possible changes, the relationship with the concept of time is essential. We already presented here a conceptualisation of this very controversial topics, in which we state, in brief, that the bricks of time are segments of an extended present, which are phases of broader processes, which in turn are sequences of phases. The actual duration of each phase (and of whole processes) is an issue for the chronometric time, whereas the structure of phases and the operation made on them are issues for what we define logical time. This conceptualization differs from the Newtonian time which is modelled upon the continuity of space, is reversible, and is dominated by quantitative measurements. Our approach is much more in line with human and societal irreversible processes, where qualitatively different phases occur at different pace in different countries but with many commonalities.
In a phase of logical time, one or more units switch from a class to another, thus modify their status. Every phase has a change. Until there is a change, the phase is prolongued. Sub-phases are possible, as envisaged in the general conceptualization of time we refer to. Thus a sub-phase might be defined as having only one unit switch. Since however it may be possible that two units switch exactly in the same time, one would rather craft a structure where every unit has its own storyline and the subphase containing only one unit switch is referred to such storyline, whereas for the system one would need a vertical fusion of lines.
The duration in minutes, days, months or years of a phase is a matter for the chronological time. Thus an accelerated dynamics can come by shorter chronometric-time attributions of the logical phases. This is distinct from a situation in logical time by which in a phase an increased number of units switch (which is also in a way an acceleration of change). Deceleration of change is symmetrically given a double meaning, one in the chronometric time and the other in the logical one.
These accelerating and decelerating dynamics at the unit level may not be reflected at the macro level, thus in the shape. If the same number of units change from s1 to s2 as the number of units change, during the same phase, from s2 to s1, the shape remains the same. If there is a discrepancy between the two numbers but still some symmetric flow exist, the shape would be different than that produced only by the first or only by the second flow. If the two flows are in subsequent phase, however, there would be a reversed change (first with a shift towards shape A to shape B, then from shape B to shape A).
If flows in the same phase involves a third class s3, then the shape would be C, different from A and from B.
All this summarises in that temporal morphogenesis, even if due to microdynamics at unit level, is not trivial. There is a profound connection between micro and macro, but it is not under a caeteris paribus clause nor it is bi-univocal. By including the dinamics of other units, there can be large movements of units without any movement in the shape or the emergence of a shape that visibly deviate from what you would expect by considering only the dynamics of a sub-group of units.
We take the occasion to make explicit that the process of change in itself can equally be stochastic or deterministic (thus traceable to exogeous variable changes) or volontaristic (in the sense that it derives from decision that the unit itself takes, with a process of decision making that can be know - or not - to external observers and to modellers). All what we say is applicable in any of such cases.
If the process is stochastic, it may rely, among other alternatives, 1. on a transition matrix which specifies an objective probability for any unit of a class to go into any class (including itself). 2. a unit-specific probability of changing the class, possibly influenced by its history 3. a combination of the two (with probability for a unit containing an idyosyncratic and a class-specific component). This means that temporal morphogenesis can cope with non-Markovian processes and be used to model them. This is very appealing for analysing human systems because the stochastic processes that characterise them can be seen as largely dominated by non-Markovian processes.
If the process is volontaristic, units should have an internal decisionmaking mechanism, possibly with exogenous and endogenous components, thresholds, etc. For instance, the unit would shift up one class if it succeeds for three times in a certain task, which in turn depends on the degree of difficulty, the time and effort.
We are perfectly fine with the possibility that each unit may have an individualised way to make a decision about the class to which it wants to belong and that may even lie about it to us, external observers. We do not assume a unified process for all units, a differential equation applied on all points of a space, in the logics of fluid dynamics. Moreover, units' individualised way to make a decision may well contain references to remote and near past, to macro variables, macro-shapes, meso-shapes and what other units have been doing over time. They can draw on the actual values of all of them, but also being based on some source of information (media, social networks, etc.) as well as on guessing and biased interpretations (including confirmation bias).
Units can belong to networks and be influenced by the history of their relationship with other nodes of the netwokr.
Conversely, there is more than one point in common between what we call "shapes" and what statistics call "distributions". However, we do not restrict ourselves to the case where the attribution of a unit to a class contains a stochastic element. In terms of cultural orientation, we are not interested in finding smooth and continuous functions generating the distribution, in itself considered as an empirical result and an approximation. On the contrary, we take shapes very seriously, as the generalised standard to which an analyst can fit empirical realities and a modeller can represent the results of e.g. an agent-based model.
Let's start with the simple sequence of shapes: (20, 0, 0)-->(19, 1, 0)-->(19,0,1)-->(18,1,1)-->(18,0,2), each of which is included in the morphospace denoted as shapes(20,3).
This sequence can be triggered by a single unit changing from a class to a higher one in each phase of the logical time, until it reaches the highest class. At that point, a new unit begins its travel up. Since the target classes are initially empty, there is no issue with the opposite dynamics. If this process were to continue unchanged, one can expect to see Class 3 to grow over time, in symmetry to the shrinking of the Class 1, with Class 2 serving as intermediate step, always empty or very small (numerosity 1). Minor variants of this process might involve a larger number of units transitioning during the same phase of logical time - thus with temporary larger numerosity in Class 2 and faster depletion of Class 1. The final state of this process can be expected to be (0, 0, 20).
One needs very little fantasy to construct the opposite process (with systematic change from a class to a lower one). If, however, the two processes happen at the same time, with equal or unequal intensity, a number of intermediate shapes appear, possibly with a dynamic equilibrium in which Class 2 is never empty or even becomes the largest one in numerosity.
A certain shape may be the result of the prevalence of fall-inducing over rise-inducing processes. Another one the result of the opposite condition. Processes can differ as for the number of classes you can go up (or down). In the canonical shapes(50,5) a process could include a jump of up to 4 classes.
The individual fate of each unit can be in any class, possibly with units that remain most of the phases in the same class and others making the most movements most of the times (or instead a more balanced participation to movements of all units).
Which other process can you identify? That's the venue of analysis opened by temporal morphogenesis and the new categories for understanding change, shifts and modifications.
To take a broader view of the morphospace in which all these processes take place, let's now introduce in the following graph the full list of the shapes(20,3), which includes 231 shapes (each of three columns/classes).
From left to right, the numerosity of Class 3 is rising, taking units away from Class 1 and Class 2. The total is always 20. This is the exhaustive list of the ways in which the total of 20 can be divided in three classes of integer numerosity (0 to 20). The first shape is (20, 0, 0), the last one is (0, 0, 20). All intermediate possibilities are present in the figure. Since this graph is meant only to statically indicate all shapes, any shift in shape is a jump from one shape to another.
Each shape can be given a unique identifier (IDShape), so the description of the shift can be precisely indicated with the sequence of IDShapes.
An animation can show the morphogenesis even better, as it is here the case with a possible evolution from a flat uniform shape (10,10,10,10,10):
Animated GIF - if the animation does not start, click on it.
These shapes all belong to the morphospace shapes(50,5). A verbal description of these shifts may include the observation that Classes 1 and 2 don't evolve, while a transitory fall in 4 (and growth in 5) leaves way to a systematic grow of Class 4 from time 4 on, when class 3 and 5 contribute to it.
In terms of chronometric time, phase 1 is long, phase 10 very long, and all the others are pretty short. Changes in the properties of the animation may render the issue of accelerating and decelerating in chronometric time (as opposed as in logical time, where the number of units changing the class to which they belong is the main metrics).
An example of the same shift, with a distorted chronometric time, is here:
Animated GIF - if the animation does not start, click on it.
If a statistical office takes a snapshot at regual chronometric intervals of 100, this might be the timeseries that it would highlight:
This because, while the office takes snapshot each 100, the duration of each period is the following:
These discrepancies between standard snapshot (meant to make comparable the values) and the actual chronometric duration of real phases is part of the analysis that can be developed within a morphogenetic approach.
This paper is not the right place where to systematically explore this issue nor the rules that govern the shifts in shapes. But some very quick reflections may begin to shed light to certain, limited, aspects, and provide pathways for future research.
Morphogenesis is the explanation of the process leading to a shape. Given the morphospace, you can highlight abstract processes leading from a shape to another, possibly passing through one or more intermediate shapes.
You can take two approach, given a certain abstract process (deterministic, stochastic or volontaristic):
A. You take a shape as the starting point and you apply the process at unit level. You get to a new shape. You further apply the process to the new shape and you get a third one. The collection of the new shapes will be explained by the sequence (and the process).
B. You invert the process and retroapply it from a shape, generating the preceding shapes.
it's clear that there are several pathways that can lead to the same final shape, which in turn substantiates the pluralism of origins and causes, as in our methodological overall approach. Conversely, these two types of procedure embed an evolutionary way of providing for an explanation:
1. starting from the past, operating a process and getting to the present, thus explaining the present with two components - an historical, not justified, state-of-the-world and a process of change;
2. starting from the present, operating an (inverted) process, thus generating the past states-of-the-world until an arbitrary remote past.
The demonstrations that you can build are both positive (which are sufficient conditions, i.e. abstract processes and starting points, for generating a shape) and negative (e.g. why a shape cannot derive from a certain process unless the starting point is of a certain shape). The latter is made stronger by an approach that is able to generate the full morphospace, since then you can test all possibilities and your negation can be validated.
We now would like to underline that, in general, what shifts the shapes are the differences in bilateral and multilateral flows of units. Every time the homeostasis dynamics collapses, a shift in shapes happens. To be more precise, a shift depends on the difference of overall sum of inflows and outflows occurring in the same phase.
It's pretty clear that there is a rich discussion to open on which rules lead to which shapes. The relative importance of the initial shape and of the process of change can vary: in certain cases of relatively neutral processes, the initial shape determine the final one; in other cases, the iteration of the process itself leads inevitably to a certain shape, irrespective of the initial condition. This reverberates in the present context the widely debated issues of chaotic dynamics, with the high sensitivity to initial conditions ("a butterfly here can lead to an hurricane in Florida") or, on the contrary, of irrilevance of initial conditions (as in many stochastic processes) since whatever the initial condition, it's the process that determines the final conditions.
Three problems are now open: 1. to identify possible partitions of the space of all possible shapes; 2. to explore whether actual human systems tend to generate and remain into a sub-space of all possible shapes 3. to combine the two previous problems and to identify some partitions of particular interests for human systems.
We can't here develop all three venues in full. However, we sketch a few possible answers at least to the first problem.
A partition separates groups of shapes in a way that all shapes can attributed to a group. Since in mathematics a member of a partition is usually called a "class", name that we utilized for something else, we propose that in the approach of "temporal morphogenesis" we use the word "pattern" instead. Thus the universe of all possible shapes is partitioned in one, two or more "patterns". We intend here to signal that, at a point, the techniques of "automatic pattern recognition" may well play a role in such attribution.
To start in the easiest possible way, we propose to partition according to rules, leaving for more advanced treatments the utilization of metric distances from prototypical examples (in order to compute all possible distances in the canonical morphospace one would have to compute around 10 billions values, thus some restrictions need to be established before this second way becomes practical).
The rule we propose is the systematic comparison between the numerosity of each class with the numerosity of each other class, utilizing the trychotomy: "larger than", "smaller than", "equal to".
For instance the shape, contained in the canonical morphospace, (44,3,2,1,0) exhibit a pattern characterised by the fact that the first class is larger than all the others, that the second is larger than the 3rd, 4th, and 5th class, that the third is larger than the 4th and 5th, and, finally, that the the 4th class is larger than the 5th (but smaller than all the others). To the same pattern belongs the shape (40,4,3,2,1).
Indeed with some computation you find in this Excel file, the canonical morphospace, constituted as such by 316 251 shapes, requires only 541 patterns to be exhaustively partitioned according to all possible values of the trychotomy across all the 5 classes.
In particular about the 66.3% of all shapes is characterised by the absence of "equal to" value of the trychotomy (all classes are strictly larger or smaller than the rest of classes). There is only 1 shape that exhibits classes perfectly equal to each other: it's the shape (10,10,10,10,10).
The pattern described for (44,3,2,1,0) contains as many
as 1747 shapes. If the emergence of shapes were a stochastic process of
uniform distribution, it would be 1747 more likely to emerge than the
There are 120 patterns with 1747 shapes each: they are all the patterns without "equal to". The full distribution of the numerosity of each class of the partition is provided here:
A major advantage of the rule-based approach (and its particular way in which we applied it above) is that it can be applied to morphospaces of any n, producing the same pattern. Thus shapes in a smaller morphospace can be attributed to the same patterns as shapes in larger morphospaces (in the sense of the extension of n, not of s). Indeed, we release here all such smaller morphospaces to the canonical representation, to give a clue of familiarity to such structures, we propose to call them the "smaller sisters" of the canonical morphospace. Conversely "larger sisters" of the canonical representation would be the morphospaces shapes(51,5), shapes(52,5), shapes(53,5) and so on.
In any case, to check whether this partition is already acceptable (being able to separate shapes that are considered as qualitatively different by humans and conversely to keep in the same pattern shapes that humans would consider as similar), you can activate the procedure for interpersonal validation of the results that we developed here.
Other partitions can be proposed, e.g. starting from a metrics for distances between shapes. Using a deliberately lax introductory language, we would then suggest to take for each group a prototypical example, to establish a metrics that somehow measure the difference from this prototypical example, and use comparison with the "distances" from other groups' prototypical examples as an indication for attribution. If the distance is "too high" from all groups' examples, then a residual group "Others" may be established. A simple metrics could be the difference in numerosity of each class.
In another approach, one could formally use cluster techniques, applied to the morphospace. A k-means clustering would produce k patterns, with k pre-set ex-ante based on reasoning or optimally chosen among a list of possible k e.g. by minimizing the Average Silhouette Width as done in Bektas (2019).
Moreover, you can take the medoids of all patterns identified with the above-given rule and compute the dissimilarity matrix to generate this clustering of patterns. In appendix 7 you will find a further approach to partition the morphospace, based the number of times two shapes are included in the confidence list derived from a small sample, under an intepretation of shapes as categorical discrete probability distributions.
Until now, we considered the unit (micro) level and the shape (macro) level. Between them, we now insert a meso-level of shapes that combine together, thus producing other shapes. You can add two mesoshapes to get a third one. You can add more than two mesoshapes to get a new macro-one. You can disaggregate a shape in two or more others. Mesoshapes are shapes, constituted by units and constituting macroshapes.
Many very interesting questions arise (and can be systematically explored and answered):
1. under which conditions two meso-shapes share the same pattern of the macro-shape;
2. under which conditions one meso-shape determines a part of the macro-shape and another meso-shape determines the rest of the macro-shape;
3. which shapes are neutral, so that the macro-shape is entirely determined by the other;
4. under which conditions two meso-shape generate a macro-shape entirely different from both (i.e. a macroshape that belongs to a pattern that is not the pattern(s) of the mesoshapes);
5. what happens when instead of two meso-shapes we consider three or more;
6. what the absolute and relative numerosity across each shape can tell about the abovementioned relations, with absolute numerosity being the total number of units that is distributed across all classes and with relative numerosity being the couplewise ratio of all absolute numerosity across all meso-shapes.
To begin approaching answers to these questions, let's introduce the basic operation of disaggregating a shape into two others by a numerical example.
In order to get the shape (12 11 10 9 8) from the shape (12, 11, 0, 0, 0) with only one further shape, you have to add the shape (0, 0, 10, 9, 8). In this, you get the latter by simply decreasing the numerosity of each class in the initial shape by the numerosity of the same class of the second shape. The first shape is characterised by an n = 50, the second by n=23, the third n=27. Only the first belongs to the canonical representation. This way of making the subtraction is simple and univocal (for a more complex definition of sum and subtraction in shapes belonging to the same morphospace, see Appendix 4).
This definition of disaggregation involves shapes with
the same number of classes. For you to operate disaggregation of the canonical
representation, we release in the Appendix 1 the 49
morphospaces characterised by n equal to any integer between 0 and 49
(what we called before its "smaller sisters"), so that you
can experiment with sums in which the sum of the numerosity is 50. In
particular you can systematically identify the resulting pattern of a
sum between two shapes belonging to two different patterns. This is possible
because, as noted before, a rule-based partition can be applied to morphospace
of any n (but same s). Do the sum of any member of a pattern with a member
of another pattern result always in the same pattern as outcome?
As we saw, if you split the shape (12 11 10 9 8) in two meso-shapes you get the previous two (12, 11, 0, 0, 0) and (0, 0, 10, 9, 8). But not only! There is a large range of other possible shapes, including (10, 2, 7, 7, 7) plus (2, 9, 3, 2,1) or, on the other hand, (5, 5, 5, 5, 5) plus (7, 6, 5, 4,3). What does it tell you as for the previous question?
The number of mesoshapes that (pairwise) summed up would generate a shape(n1,n2,n3,n4,n5) is (n1+1)*(n2+1)*(n3+1)*(n4+1)*(n5+1). The presence of the addend 1 is due to the zero among the possible numerosity of the class. According to such formula, the shape (12 11 10 9 8) admits 13*12*11*10*9 addends for 13*12*11*10*9/2 sums. It means 154440 addends for 77220 sums.
The shape (44 3 2 1 0), in the same rule-based pattern of the previous one but significantly more polarized in the unit distribution, admits 45*4*3*3*1 addends for 45*4*3*3*1/2 sums. It means 1080 addends for 540 sums. Does this means that there are way fewer mesoshapes that add up to a polarized distribution than a more balanced one? If you want to have an answer, do your own computation across patterns.
The meso-shapes are constituted by units as all the other shapes; they undergo the same shifts, due to the micro-dynamics. What is new is the exploration of the different ways meso-shapes interact and lead to macro shapes (and viceversa: what can be inferred from the macro shape in terms of the shapes it can be split in).
This is a fundamental part of temporal morphogenesis: a shape may well derive from the sum (or any type of other interaction, including min-max skylines and non-linear interaction) between two or more shapes; shifts in a shape over time may well be due to (and explained by) shifts in the constituting meso-shapes (as well as the inclusion of new meso-shapes, the disappearance of one or more meso-shapes, etc.).
One should, meanwhile, note that a reason for the existence and for the identification of the meso-shapes is that, while considering two aspects of the units, each meso-shape contains the units in the same class of the second aspect. If consideration is given to three or more aspects of the units, then each meso-shape can be defined as containing the units in the same cell in the cross-matrix of all classes of all aspects but the first.
This is a very general formulation, whose examples may be very easy. Meso-shapes can correspond to different geographical areas in a country, with the macro-shape corresponding to the country. Here units (e.g. companies) have two aspects: the main one under consideration (e.g. the size of the workforce in discrete classes) and their geographical location. If we consider as further aspect the economic sector to which units belongs, then sector themselves are meso-shapes. Now units have three aspects (the main one, the geographical and the sectoral ones). Meso-shapes can be derived for each sector in each geographical region.
In short, temporal morphogenesis is about the generation over time of shapes from shapes by means of the unit-level microdynamics and/or operations embracing the meso-shapes. This can be formally and systematically done utilizing the canonical representation, which is the list of all possible shapes under certain numerosity of units and classes, taking every two shape (within and outside a pattern) and summing them up to obtain a macro-shape, of which retrieving the pattern.
Classes can be defined freely, in very simple or by multiple criteria and very complex ways. How do we know that a unit belong to a class and not to another? This question can lead to two very different, but complementary, methodological decisions. On one hand, we can always add to the set of all classes one additional class "we do not know" and treat as any other one (with every possible change from and to this class). Indeed, such answers are often coded in empirical survey, with special methodologies developed to cope with the refusal to answer leading to a missing value (which may be deducted or imputed based on other information we collect).
On the other hand, belonging to a class may be the result of a computation, maybe as simple as being between two thresholds in a single variable or as complex as being part of a cluster. In the first case, the thresholds themselves can be fixed, thus requiring only to know the value for the unit, or being dependent on all other units' value, or be moving over time. For instance, in a shape(n, 2), a person can be belonging to the class "poor" because its gross yearly income is below a certain absolute threshold in US dollars, given the current exchange rate, or to the class "not-poor" if he or she is above. Inside the US, the attribution would require to know only one value (and with large amount of imprecision if it is well above the threshold). Note that you can have a lot of proxies for estimating income (with some risk of mis-attribution). But poverty can also be defined in relative terms, so one needs to know all income of all people (or an average given an aggregate), establish percentage thresholds and proceed to classify. These threshold would moves every year (or at least every time new statistics are available). Accordingly, some units might change their class not because they modified their income but because others did. Moreover poverty can well be defined in a broader set of variable that income, as we describe here.
Until now, n and s were fixed. Units do not escape the system, do not die, do not split, etc. They cannot acquire new features (classes).
The removal of these assumptions is simple: by running several times the core routine you compute shapes(new n, new s). This can be done once and for all and then drawing on those results to produce shapes that differ by n and s.
In general, n is an integer. Indeed, the core code does not allow to relax this condition. If you have the necessity to include terminated decimals (i.e. integers with a finite number of decimals after the decimal sign), you will need to tweak the routine (e.g. performing it with integers and then elaborate the results), but you need to be careful in your procedure. We do not offer venues for doing it.
If a unit belongs to an intermediate category between two successive others, you may add (s - 1) new columns and define them as intermediate. If the unit belongs to an intermediate category between any two others, you may add (s-1)+(s-2)+(s-3)+..+1 new columns. In technical terms, it's very easy but that would require to have morphospaces with a lot of columns, which would require a massive new computations or to satify yourself with pretty low level of precision (i.e. n. of units).
If categories are fuzzy, so units belong to more than one class with a fuzzy degree of truth between 0 and 1, you can compute several times the same state-of-the-world and verify whether at the aggregate level the shape remain the same or there are just few shapes resulting from this condition.
Similarly, if a unit has a certain probability to belong to a class and other probabilities to belong to other classes, you run, in a Montecarlo method, the shape many times drawing (pseudo-)stochastic numbers and verify whether at the aggregate level the shape remain the same (because of the large number of units and of implicit bilateral flows across categories) or there is a limited number of shapes resulting from this condition. In particular, it would be relevant to check whether the shapes belong to the same pattern.
Multiple classifications according different aspects can, at least in part, be approached by allocating blocks of classes (columns) to each aspect.
For instance the graphical example we presented in the Overview would be interpreted in this way: out of the 10 classes of the morphospace shapes(10,10) the first two would be used to represent geometrical shapes, the further 3 the sizes, further 4 the colours. The 10 objects would appear once in each block, thus you would actually need shapes(30,10).
More in general, in each block there are n units, thus the row totals are n times the number of aspects, thus column blocks.
Thus introducing multiple classifications means considering a shape(n*aspects, sum of blocks), in which, however, certain shapes are forbidden (you can't have more or less than n in the row sum in each block).
The core code presented in ch. 4 can produce such a list: it produces a larger morphospace in which you select only the rows matching these restrictions. Accordingly this extension does not alter the whole discussion, mutatis mutandis.
For instance, 3 objects classified along 2 aspects (e.g. color and size) with 2 classes each (e.g. blue and yellow; small and big) require a morphospace shape(6, 4) - computed by the core code - of which you extract the rows for which the first two column sums to 3 and the second two columns sums to 3:
In the first state-of-the-world, three blue objects
are small. In the last, 3 yellow objects are big.
Another approach to multiple classifications is the nested approach in which you first distribute the n units according to one aspect, then the numerosity of each class becomes the total for a new distribution according to the second aspect, and further applications are constructed for all the other aspects. An extensive example of the nested approach is presented in Appendix 3.
The limitation that n is known in advance and that all shapes sums to n may turn out to be annoying in certain circumstances. One might want to interpret classes as periods of time, thus having a total n known in advance would seem to deprive the future from true suprise. If in previous periods, the sum is already near n, then the future is constraint to be "small" and from a point on, zero for ever.
If this is your case at hand, then a new approach is called upon, leaving behind the core method we presented in ch. 4., whose routines hinges on the total n to drastically reduce the number of shapes. Then we need a new beginning.
Let's interpret classes as periods of logical time. If you are fine with a modest number of periods and of unit per period, you can take a morphospace made of n^s shapes, where n is larger than the expected maximum of numerosity (e.g. given an empirical distribution, the max class). Since this number is going to grow very fast, we provide you the 161 051 shapes of the morphospace shapes(0 to 10, 0 to 10, 0 to 10, 0 to 10, 0 to 10, 0 to 10), which is the morphospace with 5 classes without any restriction on the sum of units but restricting each class to contain a integer between zero and ten (both included). Its numerosity is 11^5, since each class can contain 11 values.
If you actually need to have a large and growing number of periods, you probably need a new way to reduce the number of shapes.
Let's have a limit, say 100, to the number of unit in each period but with a rising constrain of the total over t periods (t * 100). This number can fall to zero. Everytime a new period is added, the total number of units rise (but one knows what happened before). Thus there are 101 alternatives per period (the 100 positive integers and the zero). This brings the total number of possible shapes (with t classes) to 101^t. It's a pretty big number already from 8 on (more than 100 millions).
We can drastically reduce this fast rising number of shapes by looking only at the change with respect to the previous period. Let's change be described in 5 categories (large rise, rise, more or less the same, fall, large fall). Numerical threshold can be proposed to attribute each numerical change to a category. Once done, you get 5 ^ (t-1) shapes. You might want to have a first period in which the numerical values have been categorized in 5 levels (high, upper middle, middle, lower middle, low), so as to have a starting point for the change. This leads to 5 ^ t shapes.
if the third period you have a change that is the same as from t1 to t2, then you have "monotonicity", if not, you get "turbolence" (or one among "reversal", "deceleration", "acceleration"). If you compact all the latter labels into just one category ("turbolence"), then you have 2 ^ (t-2) shapes (times the initial conditions in the first two periods). By renouncing to numbers in favour of qualitative characterisation we drastically reduced the number of shapes (each shape is getting more and more similar to a pattern in the sense we discussed in ch. 7).
Further gains in lowering the number of shapes can be achieved by utilizing even lower level of detail and explicitly managing errors.
Calling ++ the large rise, + the rise,= the more or less constant value, - the fall and -- the large fall., you can have an algebra of sequences of signs ++, --, -, + with final result from the sum of signs (still constrained to be at most ++ and at least --). If what matters to you is the final result with respect to the initial conditions, you can compute the sum of the signs and get only 5 possible outcomes.
It's necessary to recognize that this algebra can fail, so you have True or False to accompany the previous indication of outcome. The dimensionality of this computation is 2. You can study under which condition the algebra is True or False by taking all possible definition of the signs and test the algebra for all possible numerical values (0,...,100). For instance, let's initially define ++ as an increase of 50 or more, + an increase between 5 and 49, = as +- 5, - as a fall between 5 and 49, -- as a fall of 50 or more. Be the initial conditions be -- for values below 25, - for values between 25 and 45, = for values between 45 and 55, + for values between 55 and 75, and ++ for values above 75.
Then couplewise you take 101 x 101 alternative "sums" and verify in which cases the algebra works. For instance 66 + 34 = 100 reduced to signs become ++, + leads to ++: True.
(+34), (+22) leads to ++: True
+12, +7 leads to +: False
Moreover, once the categories are interpreted as periods of time (see also appendix 2), you can operate with them imposing the operation of time here described. In particular you can collapse two or more phase with the same sign (or going from a sequence of + into ++). The chronometric duration of the phases wil increase but the number of phases will drop. Conversely if you are studying a universe of shapes without numbers but with sign +, -,++, you can split each phase into two or more with the same sign (or splitting splitting a ++ into +, +). You can study if this lead to False. This procedure clarifies the numerical limits of the algebra. If you judge it insufficient (because it generates too many false in values that are relevant for your case at hand) you might try to increase the number of classes into +++++, ++++, +++, ++, + etc..
The limitation of a total 100 in each phase once again can be interpreted as percentage (with a total fall of max 100%). Then probably you would like to have +100, -100 as the limits. Please note than the reduction to five categories remains intact, with a different definition of the transformation from numbers to strings.
Indeed, you may completely remove the limit. This only impact the "reverse engineering" of strings to numbers, if necessary at all. For this expression we mean the re-attribution of a number to a sign. If ++ is defined as above, you may reverse engineering it as a random number in the range 76 to 100. If you have no limit but you may accept a random distribution with an infinite domain, with a falling probabilities the higher the numbers.
Shape is an ordered row vector of integers. The morphospace is the exhaustive list of shapes sharing a certain feature.
A typical feature shared is the scalar representing the sum of all values contained in the vector. This scalar can be intepreted as the number of entities of the micro-level. The dimensions of the ordered vector are called classes.
Shapes can be macro-shapes and meso-shapes, the difference being not in any numerical property but in the use: macro-shapes are generated by micro-dynamics and by combinations and shifts in the meso-shapes that constitute them.
Temporal morphogenesis is the study of the generation of shifts from shapes to shapes occurring over logical and chronometric time, tracked back to changes at the microlevel and to operations embracing meso-shapes. Temporal morphogenesis identifies abstract processes (at the micro and mesolevels) that lead to a shape. Abstract processes include the reclassification of units in classes.
To use the approach of temporal morphogenesis in empirical studies, you need to establish which is the most appropriate morphospace where all possible state-of-the-world might in principle belong. You need to choose a definition for the classes and for the units. It may happen that the morphospace you need is already distributed here or can be obtained by deleting some of the shapes in a distributed morphospace. It is fully possible that you need to run the core code to obtain your morphospace. As for the definition of the classes, if the main variable you want to investigate is numerical with a lot a values you might want to select intervals to fit a certain number of classes. The canonical morphospace has some additional elaborations, such as patterns (ch. 7) and the attribution to the periodic table of shapes (appendix 5); its five classes may reflect the traditional Likert scale (low, lower-middle, middle, upper-middle, high), which is also be considered as embedded in natural language (in the values of the adverbs that "quantify" an adjective). But in certain situation you may want to have two additional extrems (very low and very high). Both situations are covered here in terms of vertical differentiation.
Once you have obtained the morphospace, you order it in the conventional way and give an automatic IDShape to each shape. Until the morphospace remain the same this name will simply be the number in the sequential order. If you change of morphospace then you need to signal this in the IDShape.
We propose the following convention: if the morphospace has no further restrictions than that the sum of units is fixed at n and the number of classes is fixed at s, then the IDShape is (n,s).orderinthesequence. If you imposed special restrictions, it's better to have a string naming the morphospace and then IDShape becomes nameofthemorphospace.orderinthesequence. As defined in ch. 4, the order is conventionally the descending order of the numerosity of class 1, then of 2, then of class 3, then of class 4, then of class 5 (for the canonical representation), the of class 6, etc. (for morphospace of larger number of classes).
Then you take your data and you fit them to the shapes of your morphospace. It is likely that you will need to round values and compute percentage. In any case, you get to a description of your data in terms of IDShapes. If your data are temporally ordered, you get sequences of IDShapes. At this point you can begin to reflect upon which abstract processes could be explaing this sequence. It is likely that at the end you will need to make simulations with such abstract processes and provide an explanation out of it.
Both whether your data are temporally ordered or not, you can attempt to find out a partition of the morphospace that directly or through reaggregation of patterns separates all your empirical data from "non-existent" shapes. Ideally you may find a single characterisation of all your data, considered as existing, and differentiate them from theoretically possible (thus included in the morphospace) but not empirically detected.
If your data are wide enough to cover relevant part of the world you can then say something on the world itself.
Conversely, a finer analysis of your partition and its patterns (including their medoids and possibly reaggregation of medoids in clusters) allows you to innovatively analyse your data and answer your research questions.
At a certain point, you might want to de-quantize the values you obtain either by referring each class to a continuos interval or by interpolation (including by splines)
The main steps are as for empirical researches: you need to single out a morphospace that is well suited to map your assumptions and results, with appropriate definitions of units and classes. Then you actually obtain the morphospace either by running the code or directly using a released morphospace (or its modification). You order the morphospace and compute the IDShapes, using the same convention as in ch. 12. Then you approximate the assumptions and the results mapping each of them to a shape in the morphospace.
In this way, you characterise assumptions and results in their shapes and patterns. It is a distictive advantage to have the exhaustive list of them. This strenthen the conditions (e.g. necessary conditions, sufficient conditions) for "things to happen". For instance you can demonstrate that, within your model, certain initial shapes never evolve into another list of shapes (the former and the latter may be patterns or cells in the morphotable).
Conversely, all possible application of the model generate a certain list of shapes and it is very relevant whether it covers all the morphospace or a sub-set (and which) of it. Ideally you would like to be able to replicate all the morphospace or at least all its sub-set that empricial researches detected in the real world.
When your model can generate a shape that exist in the real world, you can claim that your model provides a possible explanation for reality, the more so if sequences of shapes can be generated by the model and detected as such in empirical researches. You can consider an argument for the realism of your model if in both real and artificial worlds, the same stylized facts and emerging properties appear. This is part of the process of validating your model (which may include the capability of replicating a subset of shapes) and making what-if analysis or even forecast using your model.
Everyone of the following examples would deserve a paper for itself, given the multiplicity of issues at hand and the novelties that temporal morphogenesis can brinto to historically stratified mainstream and heterodox discussions. The paper might largely utilize the basic guidelines that we introduce in ch. 12 and ch. 13 However, here we limit ourselves to very briefly sketch a selection of the elements that might derived from the abovementioned framework for topics that are of great theoretical and practical interest.
Personal and household income depends on work salaries, interests owned, capital gains, rent received and many other items. Each of them influence the total, which in turn is directed towards consumables and durable goods, services, taxes, rent paid etc. These processes, coupled with macro (e.g. GDP, stated tax revenue, wage level) and meso (sectoral, geographical and social groups) variables deeply intertwine with electoral results, feeling of social justice, speed of innovation diffusion, etc.
In socio-economic terms, the rich, the poor and the middle class can be mapped, with their potentially different decision rules and domains, but also further disaggregated in further classes.
In all this, personal and functional income distribution intertwine, mediated by the people's ownership of rights upon functional flows. An increase of income share by labour, for instance, leads to growing personal income for workers. Conversely, an increase of income by capital, leads to a growing personal income for capitalists, according to their share in wealth (stock, real estate), both in vertical and horizontal dimensions (i.e. how much capital and in what it is invested / embedded).
In order to grasp these dynamics, temporal morphogenesis offers an exhaustive list of possible income distribution shapes. This morphospace is asymmetrically filled in by international and national income distribution empirical data, such as those provided by UNU-Wider World Income Inequality Database and by the World Bank PovcalNet.
The techniques of the temporal morphogenesis allow to cluster different income distribution structure and to apply partitions to map the evolution of income distribution over time. Polarized income distribution and the corresponding dynamic of "polarization" can be study in a more precise manner than relying on single sinthetic indicators such as the Gini coefficient. Conversely, wide middle class income distribution (and the corresponding dynamics of income relative equalization) can be singled out as for countries and years, opening both ways of enquiry: towards macro map of convergence and towards the microdynamics that underpin change.
The speed of change can be measured with different indicators, both referring to the logical and the chronometric time. The subjective political choices, retrieved from political and rhetorical manifestos, can be reality-checked. Which are the relative strength of different change processes at unit level, sharing and dissipating macrovariables (such as GDP growth, stock exchange dynamics, real estate prices and yields, etc.) can be assessed and given quantatative expression.
Models of income distribution can be tested as for their capability of generating realistic shapes and their shifts, providing insights on more complex micro and meso dynamics. As you see, there is plenty of options to be chosen (as for the unit of analysis, the meaning of classes, the particular restrictions to impose to the morphospace, the time frame, the geographical coverage, the social level of detail, etc.), but temporal morphogenesis can not only cope with them individually, but also provide bridges for considering several of them in combination (e.g. both personal and household level dynamics through mating preferences and the cumulative bundle).
Market share structures and dynamics are classical topics of industrial economics. Monopoly, (symmetric and asymmetric) duopoly, different types of oligopolies, a market structure with an oligopolistic core and a very fragmented and turbolent fringe: all these structures are well covered, in particular by evolutionary models, such as for instance this. With temporal morphogenesis you can identify the full morphospace of all possible market structures, of which the empirical ones and those generated in models will cover a (more or less wide and overlapping) part. This allows to evaluate whether the latters are capable of replicating all of the formers and of providing important insights about sufficient conditions for morphogenesis, including all shifts from one market structure to another, in a systematic way.
This morphospace is the universe in which all market share can be represented. It becomes a sort of "periodic table" as in chemistry, especially if attributes to market structures can be devised. This could become a relevant argument in favour of the paradigm that includes the model that made a "discovery".
If an evolutionary model turns out to be capable to generate a new market structure (a macro-shape) which all the previous models, including the neoclassical ones, were not able to, thus to some extent, it "discovers" it, and is capable to predict certain properties of it, and afterwards in empirical data we find such a market structure, this would help mainstreaming evolutionary economics as it happened:
* with certain astronomic theories that led to the hypothesis about planet Neptune's existence and were mainstreamed once Neptune was found;
* with the Mendeleev periodic table, which predicted the existence of trans-uranic elements and their chemical properties, which was mainstreamed once such elements were found with the predicted properties.
Since Mitchell (1913), the business cycle is a co-variation of several micro, meso and macro time series. Their shape approximating a sinusoid became somehow a standard. However, over time, the neoclassical counter-revolution has been trying to erase the study of systematic co-variations, concentrating all interest on the simple GDP, often intepreted as a random walk, and, when extending analysis to more than one variable, introducing co-integration as advanced but intransparent device. Too many long-term growth models assume a constant rate of growth, whereas in empirical time series there is never a year when GDP grow at the same rate as the year before, as you can test yourself in this wide dataset or in alternative files from here. Instead of accepting empirical data as they are, they torture them, under batteries of statistical tests, until they confess the preferred thesis of the researcher.
In our approach, we sharply distinguish logical time (phases) from chronometric time (the duration in months, quarters,...). Thus we avoid the too naive approach of earlier times, which were looking for constant or average duration of the business cycle. In our time approach, the business cycle is a sequence of logical phases, with no determinism in terms of which phase follow which. There exists a good number of phases, which for instance can include expansion, boom, crisis, recession, depression, recovery, with possible deletion (skipping) of phases under certain circumstances (and, to repeat, no fixed duration of each).
We do not try to elicit mathematical curves, like sine, but offer thousands of different shapes to be fitted to empirical values.
Which morphospace to choose (or newly generate) is obviously highly relevant. Leveraging the nested approach to time proposed in Appendix 3, business cycles can be mapped into the supra-yearly shapes (with the year being a class) and more finely grained, using quartal data. Countries with advanced and timely statistical data should allow to have an empirical filling of the latter.
Meanwhile, mesoshapes can be economic sectors (e.g. in SITC or NACE classification) as well as sub-national regional values (or countries in an analysis of the world economy as a whole).
Moreover, there are different types of GDP growth, both in terms of vertical differentiation (negative, slow, fast...) and in terms of structural differences, such as the leading driver of growth (exports, domestic consumption, public expenditure, etc.). Accordingly, the implementation of temporal morphogenesis can utilize information criss-crossing these macro elements.
In this relatively long and articulated paper we presented a new approach to empirical and artificial data, based on abstract numerical structures (shapes and patterns). We provided programming code to generate exhaustive lists of shapes, both for the macro and the meso level. We very introductively hinted at possible abstract processes at the micro level that would shift a shape into another one. We highlighted that mesoshapes (and the respective operations) can play an important role in reshaping the macro-level.
We provided very practical tools and operative examples that allow for immediate utilization. But we also boldly envisaged many lines of development that are open to experimentation, discovery and further improvements. We draw on our broader methodology towards human beings, pluralism, and time to defend the relevance of this approach against a wave of mathematical methods rooted in unrealistic assumptions.
Temporal morphogenesis stands out as a unifying tool for many models and empirical analyses, for which its flexibility and openness is an advantage, not a burden.
To recall the distinction we made in general methodological terms between two roles in science production, in this paper we planted a seed for action by theoreticians, interested to develop abstract processes generating the shapes of the morphospaces, and by analysts, interested to map empirical data into shapes. Temporal morphogenesis can become a field where they, implicltly or explicitly, cooperate to produce a comprehensive understanding of how the (real and artificial) worlds evolve.
The main author of this text is Valentino Piana. Alperen Bektas contributed computations in ch. 7. Khoa Nguyen contributed to the Eclipse implementation of the core code. We are all indebted to Daniel R. Page for the core code and the demonstration that the code provides a valid method as generation algorithm for second-order restricted weak compositions.
Page, Daniel R., Generalized Algorithm for Restricted
Weak Composition Generation, Generation Algorithm for Second-Order Restricted
Weak Compositions, Journal of Mathematical Modelling and Algorithms, Springer,
2012. Available from https://link.springer.com/article/10.1007/s10852-012-9194-4
Bektas, Alperen, How to Optimize Gower Distance Weights
for the k-Medoids Clustering
Mitchel, Wesley, Business cycles, the problem and its setting, University of California Press, 1913.
With this paper we freely distribute the following morphospaces:
In all the paper, except in the final discussion of ch. 10, a period of logical time is characterised by one shape, the morphogenesis is the shift from one shape to another during a phase of time and the duration of the period is a question deferred to chronological time.
However, one can be tempted by a different approach: to consider a period of time as one class, with units being the values of a variable (e.g. GDP) and the shape being a progressive expansion of the number of classes (and with the relative evolution of the variable as units). However, a direct implementation of this approach would directly lead to an open-ended number of classes without any sum limiting the total. We approached in ch. 10 this problem by drastically reducing the level of numerical detail of the covered entities.
In this appendix, we propose to take a nested approach: periods are nested according to a hierarchical order; each period is characterised by one class which contains in itself a morphospace. Every class of the latter is itself a further morphospace. It'a Matrioshka-like structure (with the crucial difference that morphospaces can well be different at each level). Adjacent morphospaces do not explicitly mix in an aggregate morphospace.
For simplicity's sake we absorbe the consolidated hierarchy of the chronological time (the year and its subdivisions, on one hand, and its supradivisions such as decades, centuries, and millennia, on the other). We release in particular the following nested structure and the related morphospaces: the year, composed by four quarters,each of them is composed by 13 weeks (with the mediation of pseudo-months of integer number of weeks 4 or 5), each of them composed by 7 days, each of them composed by intervals of hours, down to actually the hour level.
The year is contained in a decade (with 2 adjacent shapes with 5 classes), which is contained in a century (with the same structure of 2 adjacent shapes), which is contained in a millenium. You can replicate this structure up to millions or milliards of years (thus leading to domains typically outside of economics and entering geology and astronomy).
For each of this "containers" we provide the morphospace, with a certain, coarse, subdivision of 100, which once again we use to "close" the total into a percentage. In other terms we distributes the following exhaustive list of shapes:
1. shapes(20,5) - for the year in its supradivisions;
2. shapes(50, 4) - for quarters in a year;
3. shape(20, 3) for pseudomonths;
5. shapes(20,7) for days in the week
6. shape (20, 6) for 4-hours periods
7. shape(20,4) for hours in the 4-hours periods
A finer resolution would require a very significant computing time and storage space. If you have access to such resources, please use our core method and distribute the shapes you "mine".
You will notice that the most extensive shape is shapes(20,7), thus all the others can be obtained by cutting it (deleting all rows with a positive value on the last column(s) and deleting the last column(s). To save you time, we directly give you the different shapes (3-7),
With what we already distribute, given an aggregate value in a year (e.g. GDP) and a selection of shapes for nested sub-annual periods, you can disaggregate the yearly value down to a single hour precision across all the year, without the necessity of assuming any equidistribution.
It is a pretty compact way to express values over time. To put things in perspective, you need 1 shape to distribute the yearly total into quarters, further 4 shapes to distribute quarterly totals into pseudo-months, 12 shapes to get weekly values, 52 shapes for daily values, 364 shapes to get to the values for the 4-hours periods, and finally further 1456 shapes for hourly values. This means you need 1889 shapes to replicate with a fair degree of precision 8544 hourly values, with a compression rate of slighlty more than 5.5:1. In other words, if you had taken the percentage of the yearly total accruing to each hour you would have needed 8544 values. Moreover, if we hadn't adopted a nested approach, the morphospace have required 8544 classes. A precision of 1% of the yearly total (which is way below the precision we obtained by nesting) would have implied a shape(100, 8544), a list of more than many thousand billion shapes.
If you were to want an even more compact representation, you can test whether shapes in the lower level of the time nested structure belongs to one or just a few pattern (for instance if alll Mondays have the same hourly percentage distribution or all Mondays in a certain quarter have the same, or at least belongs the same pattern).
As an empirical example of use, we release also the shape identifier for all those morphospaces from the electricity consumption of Switzerland in 2016 (for more year and countries, see ENTSOE).
This nested structure gives a new method for forecasting and early warning systems. Indeed it generates from any current value a possible pathway forwards over time. Let's see how it works. Given a year and its shapes in the nested morphospaces, you generate the following year based on an expectation of the aggregate new value. For instance you expect a rise in electricity consumption by 5%. By replicating the same morphospace, you get a forecase for hourly periods since the midnight of 31.12.2018. As time passes and you get the new data, three situations can arise:
1. the data confirms the forecast;
2. the data are different than the forecast but confirm the shape, in the sense that a proportionality can be established, with a scalar multiplying the forecasts to get the actual data;
3. a change in shape is detected, which can lead to two subcases: 3.1 the shape is different only at the time level in which new data flow in (e.g. hours); 3.2. the shift is shape is detectable also in the supradivisional level (e.g. days).
Statistical methods can be established to distinguish these cases. In particular in case 2 and 3, a new forecast can be generated, with a method to establish a threshold for change to trigger recomputation. In other terms, the original expectation is revisited when deviation 2 and 3 become "too large". As criterium to judge what is "too large", you may use the partition of the morphospace and revise case 3 only when the new envisaged shape belongs to a different pattern.
Following up our example, if in January the actual values are much higher than what would have implied a 5% growth, then you can, if your method so indicates, revise up the forecast for the entire year (including for the forthcoming February), and you can check this new forecast as soon as in the initial days (or hours) of February. You might want to have a rule about a minimum amount of time before new recomputation, or a maximum number of recomputation per year.
You can test different methods and thresholds for recomputation with these data about the electricity production of Switzerland in 2017. For an out-of-sample validation of your rules, you can use these data for 2018.
In all this, we bordered with relatively common methods applied to de-seasonalise timeseries and to make comparable different periods of time (e.g. keeping into account the number of working days). However, we provided a more systematic and structural foundation for new methods and offered a venue for early warning system for modifications in yearly totals.
More in general, we provided an example of how to cope with cases where a non-nested approach would have required an excessive number of classes.
We hinted to the contribution that morphogenesis can give to the issue of computing micro-data given an aggregate single value. This issue is usually solved by assuming equi-repartition (averages). If sales in a week are 120, equirepartition tells you that in each day you sold 120/7. By contrast, with morphogenesis you provide one IDShape in addition to the value of the sale and you redistribute in an unequal, but known, way the total (for Monday, Tuesday, etc.). Without morphogenesis you would have needed 7 parametres.
We also showed that our approach can in principle be used to compress information, which in itself is a field of interest in data and computer science. Conversely, it's very likely that in such fields ways to use shapes have already been explicitly or implicitly identified and applied, pointing to a possible intersection of interest.
You may have reason to want that both (or all) the meso-shapes involved in operations and the resulting macroshape belong to the same morphospace. For instance you might want to average two mesoshapes of the canonical morphospace and obtain a macroshape that is in the canonical morphospace. This is legitimate because one major interpretation of the canonical morphospace is that units are percentage blocks of 2% each and n=50 means the 100% of anything.
This requires first a sum, then a division. You sum class-wise (column-wise) two shapes and then divide by two. The sum of (10,10,10,10,10) and (44,2,2,2,0) is ((10+44)/2, (10+2)/2,(10+2)/2,(10+2)/2,(10+2)/2,(10+0)/2), thus (27,6,6,6,5). This result is the average of the two shapes.
However an immediate problem comes out. If one class has a pair numerosity and the corresponding class has an odd numerosity, the sum is an odd number which divided by two is a terminated decimal. The result of the division exits the morphospace. (10,10,10,10,10) + (44,3,2,1,0) = (27,6.5,6,5.5,5)
The next obvious step is to round the decimal number to the nearest integer (thus the shape to the nearest shape of the morphospace). If the decimal part of the number is higher or lower than 0.5, the rounding is univocal and the sum of 50 is maintained. If we sum three shapes, in the same conditions you you get a fractional number, with an unlimited number of decimals. (10,10,10,10,10) + (44,3,2,1,0) + (10,10,10,10,10) = (21.33...+7.66..+7.33..+7+6.66..) = (21,8,7,7,7) for which n = 50.
But if the decimal part is exactly 0.5 then you can round up or round down equally well. If you systematically choose to round down, the resulting shape will have less than the original n. Rounding down (27,6.5,6,5.5,5) gives you (27,6,6,5,5) where n=49. This time you have exited the morphospace (although entered a "smaller sister" of it).
If you have, as in this case, two rounding operations, you should alternate round up and round down. (27,6.5,6,5.5,5) becomes (27,7,6,5,5) or (27,6,6,6,5). Both cases are within the morphospace with n=50. A useful convention in applied statistics is to round to the pair integer (i.e. between an odd integer and a pair integer, to choose the pair), which automatically translates into the alternation between rounding up and down that is needed to maintain the sum (50 in this case). According to this convention, the average of those two shapes is (27,6,6,6,5). However, it is still possible to have situations in which you end up with only 49 units. In this case, you add one to the class having the highest numerosity, which corresponds to the criterion of minimizing the relative impact. The same criterion applied to the case in which, after rounding up and following the nearest-pair-integer convention, you end up with a sum of 51, you subtract one from the class with the largest numerosity. In case there are two or more classes which are the highest, you probably need to establish a further criterion or end up choosing randomly.
In the rule-based case we are developing in ch. 7, the pattern definition does not require a fixed sum and can be applied to any 5-class morphospace (of any n, including cases with decimals). Thus we may comment on the previous example that averaging two shapes belonging to two different patterns lead to a shape belonging to a third pattern. The convention was instrumental to identify which one (the decimal precise solution belongs to a further fourth pattern).
(10,10,10,10,10) belongs to the ==== pattern while (44,3,2,1,0) belongs to the ---- pattern. The resulting (27,6,6,6,5) belongs to -==- pattern. (27,6.5,6,5.5,5) belongs to ---- pattern.
To generalize, you can make all the possible sums between couples of patterns' medoids, each representing a pattern, and utilize this as normal result. A shape belonging to a pattern P1 averaged with a shape belonging to a pattern P2 results normally in a shape of the pattern P3 computed once and for all as the average of the two medoids of P1 and P2. Note that equalities can hold across P1, P2 and P3. This requires, using the rule-based case, 541*540 cases.
If you do not want to be satisfied with the normal result, you carry out the sum of the two specific shapes, then divide, you do not round but compute the pattern of the resulting (possibly fractional) shape.
You may repeat this operation for all possible couples and attribute to the pattern the result (be integer - by computation or convention or fractional). It is very likely that only certain bordering shapes will generate deviating outcomes from what is obtained by the average of patterns' medoids.
In general, the average of two (or more) shapes does not remain in the morphospace of the operands without the convention but in any case can be tracked back univocally to a pattern of that morphospace. With the convention of rounding to the nearest integer and, in case of equal distance, to the nearest pair integer, the resulting shape remain in the morphospace.
This is true also when you multiply a scalar for one mesoshape S1 and sum it with another mesoshape S2 multiplied by a scalar. To get the average, you then divide the vector whose columns are the sums of the two columns multiplied by the scalars by the sum of the two scalars to get a shape which requires the convention in order to belong to the morphospace to which S1 and S2 belong. Moreover, that shape belongs to a pattern that exist in that morphospace.
(10,10,10,10,10) * 3 + (44,3,2,1,0) * 1= (30+44, 30+3, 30+2, 30+1, 30+0) / (3+1) = (74, 33, 32, 31, 30) / 4 = (18.5, 8.25, 8, 7.75, 7.5) which, by applying the convention, becomes (18, 8, 8, 8, 8) whose sum in members gives 50 (thus the shape belongs to the morphospace). The respective patterns are ====, ----, ----, -===.
This means that if you want to weight the two shapes, you can operate and remain in the morphospace. If the mesoshapes represent subnational regions you might want to weight them according to regional GDP or regional population in order to get the national macro-shape.
A possible venue of further improvement is to change the rule upon which patterns are defined in ch. 7. In particular you might explore what happens if the definition of "equal to" is extended to "more or less equal to". The distribution of patterns in table 6 would be different if you were to relax the "equal to" relation to include "a difference not larger than" (e.g. 1, 2 or 3). More shapes would then belong to the patterns at the bottom of the table. Fewer patterns would belong to the bulk characterised by strict disequalities.
In the main text we skipped all these issues by defining the sum of two shapes without requiring that they belong to the same morphospace, nor that the result belongs to any of their morphospaces.
Inspired by Mendeleev's periodic table of elements (for physics and chemistry), the periodic morphotable re-organizes in a compact way all possible shapes of several morphospaces. In so doing, it provides to economics, social and human sciences an instrument to map dynamics and track macro- and meso-shifts. It gather shapes that share certain features in one cell, while giving a meaning to both rows and columns, much alike groups and periods in Mendeleev's table. The latter's role in the history of science as for the consolidation of chemistry and as sound foundation for reaction evaluation and new discoveries cannot be underestimated. In humble expectation, we hope to bring some contribution to a similar dynamics in economics.
We first present the general logics of the table, then give a coloured visualisation of the table, some explanations why certain cells are empty by definition, and one numerical and graphical example of shape from the canonical morphospace for each cell, contained in this excel file.
The periodic morphotable (or the "periodic table of forms", as one could call it) is a finite bidimensional table that contains reference to shapes in any numerosity of units and classes. It only requires that the classes are in order so that it makes sense to look at differences between the first and the last column as major axis of interpretation. This happens for instance when classes are time periods and you can compare the beginning with the end. This is very often the case with economic timeseries.
In the table, the left side is reserved to cells in which shapes are characterised by a first column taller than the last one (i.e. overall descending tendency). On the right side of the table we put cells in which shapes have a first column shorter than the last one (i.e. they are overall ascending). In the middle of the table we align the cells containing overall stationary shapes, in the sense that the first and the last column are equal.
As for the vertical order, in the first row at the top, we localize the shapes that are monotonic, i.e. they maintain between each successive column the same (dis-)equality as between the first and the last.
In the second row from the top, the periodic table localize the shapes characterised by one "opposition", i.e. in one pair of successive column there is deviation from the overall tendency (e.g. there is a rise or stability whereas the general tendency between the first and the last column exhibits a fall).
In the third row from the top, you find cells with shapes with two oppositions. In the fourth, three oppositions. As we consider 5 classes (columns) as the canonical case, it's impossible to have more than 3 oppositions (there are four pair of subsequent classes and at least one need to be in the overall direction). Accordingly, to avoid to have empty cells from the canonical morphospace, we temporarily define the fifth row as "four oppositions or more". If you are studing morphospaces of higher number of classes, you may want to extend down the table.
This is exactly what happens with the Mendeleev table: the higher rows are exhaustive, the lower row are filled as new elements are sought for and finally found (or artificially created). Some cells in the upper centre of the table are simply logically impossible.
In other word, groups have the same overall tendency; periods differ for the monotonicity and its deviation with respect to the overall tendency.
The table has in total 12 columns. In the leftmost 4, we localise respectively the cells containing shapes in which: 1. the differences in line with the overall tendency are equal (constant fall) 2. the differences are rising (accelerating fall); 3. the differences are falling (decelerating fall) 4. no regularity in differences.
In the 4 rightmost columns, by simmetry, we localise in the last column a constant rise (reflected in equal differences between the column that exhibit rise); in one but the last an accelerated rise, in the previous column a decelerating rise and in the further colum (n. 8) an irregular sequence of acceleration, deceleration or equal rise.
In the columns at the horizontal centre of the periodic table, which are all characterised by a stationary overall tendency, the first row cannot but have just one cell (since there are no oppositions) and the second row is empty because one opposition (a rise or a fall) need to be rebalanced by another in order the overall tendency to be stationary.
For the lower lines (from the third and down), the sequence is the following: in column 5 we place shapes characterised first by a fall and then a rise; in column 7 shapes first with a rise than a fall; in between (in column 6) shapes with other sequences, which necessarily require more that two oppositions.
The full morphotable, with self-explaining coded name for each cell, is here:
The periodic morphotable is constituted by 47 legitimate cells (i.e. logically possible). In the first row, for a stationary tendency without oppositions it's impossible to have rise and fall in any order, so the corresponding cells are necessarily empty (they are indicated in white and without acronym). With one opposition (i.e. one rise or one fall), it's impossible that a value in first class comes back as equal to a final class value. So all cells for stationary tendency are empty in the row with 1 opposition. With two opposition to stationarity, one must be a fall and the other a rise (if not, they would not compensate and the shape would not be stationary) in any of the two possible ordering (first a rise then a fall or the opposite). Accordingly no Other cell can be legitimate under any tendency.
The actually used cells by the canonical morphospace are only 33 but more cells will be used in wider morphospaces with more classes. With three oppostions, in the canonical morphospace there is only one first difference of the same sign as the general tendency, so there is nothing to compare it with, leading to attribute all these cases into the Constant column. The other columns (of the Fall or Rise tendency) are empty. Already with 6 classes they would contain some shape. So in the general formulation of the table these cells are enlisted. Simply they will contain no example from the canonical morphospace.
Also in the canonical morphospace, four oppositions can lead to a stationary shape (because they compensate each other) but to nothing else (a general tendency Rise requires at least one segment to actually exhibit rise). And in that context, more than 4 oppositions are impossible. But we called the row as "4 or more" because in this way wider morphospace would exhibit positive examples in such cells, while limiting the table. The table itself does not need to widen even for much wider morphospace (at a point one might want to add new rows but the logics would continue to hold).
Please not that the 33 cells already accomodate all the 316 251 shapes of the canonical morphospace shapes(50,5).
Many shapes share the same cell. Many shifts in shapes do not involve changes in the morphotable cell. A change in pattern may not involve a change in cell, since the 47 cells are fewer than the 541 rule-based patterns identified in ch. 7. The table covers morphospaces of any number of classes and of units, which strengthen all three points we just raised.
Conversely, a modification from one cell to another signal a relatively large shift in the shape property.
This does not mean that many changes at the micro-level are necessary in order to change cell. For instance a new period of time, shifting leftwards all the values of the classes by one class, the disappearance of the most remote one, may well generate a large change, since it may invert or modify the overall tendency, to the effect of a large jump from left to right (or in the opposite direction).
A further formal definition of groups of cells of the morphotable can be given in terms of the rule-based patterns of ch. 7. In particular, the overall tendency, deciding on which side of the table the shape is to be localised, is nothing else than the 4th comparison (taking value +1 if the first column is taller than the 5th column, value 0 if they are equal, and value -1 if it is smaller).
However, the table, with its interest for accelerated and decelarated dynamics, adds distinctions that are not available at pattern level.
The morphotable is a compact way to collect shapes and graphically exhibit morphogenesis as a process of successive shapes, especially distinguishing segments of morphogenesis that occur within the same cell and segments that generate a jump in cell.
You are probably curious to have a numerical example for each shape, so here you have the periodic table of shapes filled in all its cells whenever the canonical morphotable does contain a positive example.
You can see the graphical representation of each of these examples in this excel file devoted to the morphotable. In it, you'll find the individual attribution of each shape in the canonical morphospace to a cell of the morphotable. In synthesis, the numerosity of the cells filled by the canonical morphospace is the following:
It's symmetrical between Fall and Rise, with growing number of shapes as the oppositions grow. Within the Fall block, the shapes are more numerous as you proceed rightwards (with the symmetric opposite being true in the Rise block).
You may explore the respective distribution from different morphospaces. Possible extensions might involve a fourth block (undetermined overall tendency), and further bottom lines for higher number of oppositions (occurring with more than 5 classes). Further directions of analysis are the utilization of the morphotable to explore resulting macro-shapes from operation of mesoshapes belonging to the same or to different cells.
Each shape of a morphospace can be seen as a categorical discrete distribution of probability. The morphospace contains all possible categorical discrete distribution of probabilities under quantized probability.
The quantization of probabilities adds a fourth condition (that probability is a integer multiple of a small quantum) to the three conditions of all probabilities (to be non-negative and to add to 1 when summation is extended to all possible events, which are in turn mutually exclusive). We do not need to add that the operation of quantization has played a key role in atomic physics (including thanks to Planck and Bohr) and that there are many attempts of quantization of other physical properties and phenomena. As our version of pluralism is centred on humans and human decision-making, we are eager to embrace quantization as a way to reflect a dominable multiplicity.
In more technical terms, a quantized probability distribution will turn out to be indistinguishable from any given non-quantized probability distribution for a quantum small enough, in the sense that samples of infinite size would be necessary in order for the difference to be statistically significant.
Let's recall from ch. 4 that the canonical morphospace has five classes (the categories that can caracterise the units) and 50 units. Thus, in intepreting one of its shapes as probability distribution, the quantum is 0.02 and each probability can take any value 0, 0.02., 0.04, 0.06, ..., 0.96, 0.98, 1. It has 51 alternative values. The shape (12, 20, 18, 0, 0), member of the canonical morphospace, represents, in this appendix and in the following one, the distribution of probabilities for which the class1 has the probability 12*0.02 = 0.24, class2 20*0.02 = 0.4, class3 18*0.02 = 0.36 and the others nihil.
In general, for a morphospace (n, s), where n is the number of units and s the number of classes, the quantum is 1/n. As said before, you can define a class in any way you want, including as an interval for real numbers.
Since the morphospace, as obtained by running the core code presented in ch. 3, is the exhaustive list of all possible shapes at the given number of classes and of totals in horizonal sum of its members, then the morphospace is also the exhaustive list of all possible distribution of probabilities for the given number of classes and the given quantum (the building unit of quantized probability). It's more difficult to say it than to understand it.
In the next appendix, we shall show how this interpetation sheds a new light to the statistics of small samples. We shall determine, for any possible sample of given size, which is the shape (thus the probability distribution) which has the maximum likelihood to generate it. We shall identify the probability for any possible shape to generate the sample, thus offering the foundation for confidence intervals of shapes given the sample.
Here and in this Excel file, we limit ourselves to construct the random extraction from any shape contained in the canonical morphospace. In sheet Probabilities in each line you find a shape and a set of fifty cells, each filled with a number, corresponding to the five classes. There are as many filled cells as the numerosity of the class. At the line 102411, the shape (12, 20, 18, 0, 0), whose IDShape is 102411, is accompanied, in the columns J to BG, by 12 cells filled with 1, 20 cells filled with 2, 18 cells filled with 3 and no cells with other values.
A uniform random extraction across the fifty cells provides an extraction from the shape, probabilistically interpreted. We provide you the code (in VBA language) that filled the cells, so that if you have a different morphospace you can re-apply the procedure and give a probabilistic interpretation to it.
In the Control panel sheet you decide how many extractions (runs) you want and for which shapes. You can also fix how many repetitions of the extractions you want. You get the extracted values in column O and subsequent ones of the sheet Random draws, by clicking on the button "Random draws" in the Control panel shee. Their synthesis as new shape are in columns I to M of the sheet Random draws.
In the control panel, column D, you can give a sequence of shapes, each with one extraction and one repetition, thus you obtain a sample resulting from non-identically distributed probabilities. By repeating many times this procedure you construct, by a procedure in the large family of Montecarlo methods, the full distribution of samples. In particular, if you repeat this procedure 50 times, you obtain a (vertical) new shape included in the canonical morphospace. This is a new example of the abstract process leading from a shape to shape, thus of temporal morphogenesis.
As you can understand, the sequence of shapes you select in column D is the morphogenesis of which we have been writing in all this long paper and in particular in ch. 6. By choosing a sequence of shapes, indicate a large number of extractions, and a repetition of one you get small empirical variations of the morphogenesis. By repeting many times this procedure you obtain not only the average (which is the pure morphogenesis list you imposed at the beginning) but also the variance surrounding the average and the full distribution of outcomes. You now have a method to trace back an empirical sequence of shapes not only to a deterministic list of shapes in the canonical morphospace, but also a cloud of similar empirical shapes in sequence to the deterministic core and its stochastic surroundings.
Moreover, if you were to make the sequence of shapes dependent on the value obtained in the random extraction you would get a sample resulting from non independent distributed probabilities. If the extraction of a value means that the next shape is taken from the smaller sister morphospace shapes(49,5) and such shape is the shape where the class has a numerosity equal to the numerosity of the original shape less one, you are making extractions without reinsertion (a classical theme in statistics). The more a class is extracted, the less likely that it will be extracted next. In the opposite direction, if the sequence of shapes is dependent of the value extracted in the sense that the new shape contains one more unit in the extracted class (and there are many shapes that share this feature, you can choose any of them), then the more a class is extracted the more likely it will be extracted next. Here you have two further examples of abstract processes of temporal morphogenesis.
To carry out these and other experiments with possible abstract processes, you'll need to modify the code in the Random draw button and collect the results in further sheets of synthesis.
Moving beyond these technicalities, we summarize this appendix by underlining that we applied our general method, the temporal morphogenesis, and its tools (shapes and morphospaces) to statistics. Relying on previous few simple steps, we generated the exhaustive list of all possible categorical distribution of probabilities under quantized probability and provided a method to generate an arbitrary number of randow draws from the list. This opens the venue for a systematic unified study of non identically distributed extractions, especially from non-Gaussian distributions. Each of them has obviously been the object of previous analyses, with much more formal methods. However, they were not comprehensively and equally treated in a unified way. More than 300 000 distributions were included, to offer a reference starting point. To recall the appendix 5 on the periodic table of shapes, a large number of non-monotonic shapes, with convex and concave alternations are included, with the final result of exhaustiveness: whichever distribution you might have, if it has the same number of classes in its domain and the same quantum, cannot but have been included. If it has a different number of classes or quantum, you can run our core code and get the morphospace which would exhaustively include your case. In its generality, the study of such distributions must rely on a Montecarlo method of investigation. In the next appendix, we shall compute some exact probability in a rational way, based on well known formulas and approaches.
A large body of methods and results in the science of statistics is linked to the law of big numbers, in a strong or weak form. In particular, statistical inference and the test of hypotheses often assumes a sample of more than 30 members so that, thanks to central limit theorem, its average is distribute according to a normal Gaussian distribution irrespective of the shape of the real distribution, provided all random draws are identically distributed. Small samples, with a size of less than 30 members, are however frequent when sampling is expensive, difficult, or face any structural or practical constraints. In empirical survey of any size, a bi-variate or multivariate cross-tabulation analysis will always generate cells that at a point contains "too few" cases (thus they collapse to small samples). In other words, small samples are important and there is an strand of statistical research devoted to extract the maximum information out of them.
Small sample are usually studied with a t-Student statistics, which, however, is valid only if the shape of the real distribution is a Gaussian and all random draws are independent and identically distributed. We shall relax all these assumptions and provide exact and Montecarlo methods for performing statistical inference when the random draws are from a non-Gaussian distribution, are not identically distributed, and are not independent.
The goal of the strand of research is not to identify averages but full distribution shapes and their patterns, as defined in ch. 7. Since the labels of the classes are irrelevant and can well referring to non-ordered and non-comparable categories, averages are not defined in our domain.
In the previous appendix we highlighted that each shape of a morphospace can be seen as a categorical discrete distribution of probability and that a morphospace contains all possible categorical discrete distribution of probabilities under quantized probability. Now we add that every possible sample of size N is enlisted in the morphospace shapes(N, s) with s being the number of alternative values that its members can take. The smaller sisters of the canonical morphospace, as defined in ch.7 and actually distributed in appendix 1, provide you already the full list of all possible extracted samples of size 1,...,50, thus all possible small samples. For instance the morphospace shapes(2,5) contains the 15 possible extractions of two elements; the 1000 possible samples of 10 extraction is in shapes(10, 5). You can't extract a sample of 10 members (from a population distributed in five classes) without finding it in the morphospace.
By cross-tabulating here all shapes in the canonical morphospace with all possible sample for N=1,...,10,...,30 we are going to show you the probability that a sample is obtained under every possible categorical discrete probability distribution (given a quantized probability). From that we shall single out, from every possible sample, the shape from which it comes with the maximum likelihood. We construct the full distribution of probability over the shapes of having generated such a sample. From it we shall extract the list of shapes from which it can come with a probability higher than any threshold and the shortest list of shapes whose probability sum is at least a certain threshold, a "confidence interval" in this particular context.
To follow the discussion, it is better that you download and look at this file, which also allows to modify certain parametres (such as the thresholds). We cross-tabulated the canonical morphospace (and its 316251 shapes) with the morphospace shapes(2,5) which collects all possible samples of size 2 (two estractions or, equivalently, two random draws). For each cell of this cross-tabulation, we imposed the multinominal distribution with the corresponding parametres: the probability of getting the sample (in column) from a population (in row). This is a well known statistics that give the value with rational exactness. A large number of actual draws would confirm it (or conform to it, if you like).
In each cell we do something done normally in statistics. The novelty is that we repeat this operation for all rows and columns, i.e. for every possible sample and for every possible population. In this case we repeat the operation 316251*15 =4 743 765 times. This gives you a full picture.
The probabilities on the same row sums to one: it is certain that one of the sample will be drawn. The probabilities in the same column are the likelihood that a given sample is obtain from each population. The completeness of the picture allows you to be able to single out which is the population for which the sample has the maximum likelihood to be generated. A sample can come from many population but only one of the latter has the maximum likelihood to generate it, thus an indication you collect from the sample is that that population is "better" than others (in its capability of generating the sample).
Applying the well-known principle of maximum likelihood is here possible and interesting, since it singles out which population is the best in this sense. The sample helps us to say something about the underlying population. If you ask "which is the real population from which this sample comes from?", the best single answer that you can give is "the population with the maximum likelihood of generating the sample".
For each empirical sample, under the constraint of known number of classes and quantized probability, we offer you the answer to this question:
* by providing you in general the core code;
* by providing you, for each sample of size 2, the values of the likelihood for a shape to generate a sample, in which you single out the shape with the maximum likelihood in this further Excel file [about 150 Mb];
* by releasing the same data for small samples of size 6, in which we computed 316251*120 = 37 950 120 probabilities (zipped csv file); we chose 6 since it is the first value after the 5 classes (since it is not good to have a sample smaller than the number of classes, which produce a zero value by definition instead of by computation).
If you look at these ordered lists, you will find something familiar: the maximum likelihood provides the most "banal" answer since it injects to the population the distribution of the sample in the most brutal and simplified way. If you have a sample of size 1, then the population is distributed 100% in the class found in the sample. The same answer is given if the sample has size 2 and both random draws share the same class. If instead they belong to two different classes, the population indicated by the criterion of maximum likelihood is a fifty-fifty split between the two appearing classes (with all other classes set to zero).
By having a numerical value for each likelihood allows for a comparison across samples of different size: the maximum likelihood with a sample of size 1 is 1 (100%), which is got five times (one per each legimate sample). Across the 15 possible samples of size 2, the max of the maximum likelihood is still 1 but for several samples, the maximum likelihood is only 0.5. In other words, you lose something by increasing the size of the sample: you risk to get a sample that can be generated by many shapes and the maximum likelihood decreases. You get a wider range of possible answer but the criterion is giving you a result which is, in a sense, weaker and weaker. It continue to give you the best answer but this answer is less and less likely.
Can we do better than that? Well, if we allow the answer to extend to a several populations instead of just one, yes, we can. But to do that, we need to consider the columns of the cross-tabulation. The sum in a column is different from one, since it represents all the probabilities that the sample is generated by considering all populations at the same time.
We highlight that by dividing the probability that an individual population (one row) has to generate the sample (in column) by that sum, you obtain the probability of the population of having generated the sample. The sum of these ratios is one (it is certain that a population has generated the sample).
The ratio is the probability that the shape generated the shape of the sample. To obtained it, we applied the most fundamental way to compute a proability: to divide the number of successful events by the number of all possible events. This value is rational and would not require an empirical confirmation. But such confirmation can be provided as well: if you run a large number of random draws from all population, that ratio is the number of times in which the population generated the sample.
The sum of this ratio across more than one population (e.g. 2 or 3) gives the probability that any of those populations generated the sample (since such events are mutually exclusive). Accordingly, you can extend the maximum likelihood criterion to embrace a fixed number of populations (which is the group of populations that has the maximum likelihood of having generated the sample) or, better, you can fix a threshold of probability and find out the shortest list of population whose cumulatative probability of having generated the sample is equal or higher than the threshold.
Since we distribute the full distribution of such ratios, their ranking, and the ordered cumulative probability (from which you obtain the confidence list at any threshold), if you have an empirical small sample, you can get such a list for your case.
On the sideline, we highlight three advantages of our method:
1. in contrast to continous distributions, we can give a single point estimation with a positive probability instead of the null probability of the best single answer that characterises such distributions. When a statistician states that 230 is the average of his 2000 random draws from a Gaussian distribution and he feels sure that in the population the average is 230 as well, he is overstretching his method. The Gaussian curve has a domain in the real numbers, thus an infinity number of possible cases. Any number divided by infinity is zero, irrespective of how large it is. In strict terms, he should always talk about an interval surrounding the value. Technically, the probability is the integral below the Gaussian curve and it's zero for any single value of the curve. Only if you take an interval surrounding the value, this probability becomes non-null. Thus,his maximum likelihood solution has a probability of zero. Only an interval (surrounding it) has a positive value - and not even high, unless the interval is "large enough"; the actual effort of increasìng the sample size is remunerated by the reduction of such interval. Conversely, with the discrete probability we approached small samples we could give maximum likelihood probabilities pretty large already with very small size of the sample. We paid a price for that, but our approach is squeezing important pieces of information from extremely limited evidence;
2. the ranked list of populations in descending order of probability of having generated the sample does not need to exhibit contiguous populations. In certain morphospaces, different from the canonical one, the populations could be pretty different in shape. So the same sample could give indications towards different directions. This is good because it gives the researcher an input for further (focalised and well-oriented) steps. For instance, the sample could point to both a monomodal and a bimodal distribution; thus the research could devise a test to distinguish between these two cases. In other terms, our "confidence interval" is less an interval than a "confindence list";
3. if the random draws leading to the sample were not independent and came from different populations, a modification of this method would provide you the maximum likelihood sequence of shapes (i.e. which morphogenesis has the highest probability of having generated the sample) as well as the bunch of sequences whose cumulative probabilities is higher than a pre-fixed threshold.
As with standard large-sample statistical inference, once you have "confidence interval", you can begin to test hypotheses. You choose a distribution as your basic hypothesis for the population (commonly called H0) and you compare it with a competitive hypothesis (H1) (one tailed or two tailed). You use the sample as to accept or reject H1 (thus falling back to H0 under certain features of the alternatives). The same logics can be followed in our case of categorical distribution of probabilities with small samples and "confidence lists".
Please note that you can randomly draw one population, generate a sample using the technique described in Appendix 6, and compute the maximum likelihood population given the sample. You know the real population and you know what this methods generate, so you get the exact error (if any), using your favourite error statistics (sum of squares, sum of absolute values, etc) or even all of them, so that to have a full view. By generating many samples you get the distribution of the errors.
One possible development here is related to bootstrap techniques. You can test how well boostrap techniques works under each shape and determine which shapes they work well with (or not). In particular, by selecting a shape as original categorical distribution of probability, you extract a sample, apply resampling with replacement, use the maximum likelihood method to get to a possible shape, measure the error (if any) with respect to "true" shape (you started with), and repeat this to get a full distribution for each shape. Then you repeat this for all the shapes (or for prototypical examples of all patterns) and get answers to the abovementioned research question.
Until now, we did not yet use the key concept of "pattern", which is a central generalisation of shapes. It is extremely useful for small samples, since you sum up the probabilities of all shapes belonging to the same pattern and you get the probability that the pattern generated the sample, which is in general higher than the former (for this to be true, it's enough that the pattern contains at least two shapes with a positive probability of having generated the sample). Instead of comparing the 316251 shapes you compare the patterns, which if defined according to the rule indicated in ch. 7, are only 541. Each pattern can contain up to 1747 shapes, thus can be generating the sample with a much higher probability that each single shape. Unfortunately, that specific definition of pattern embraces shapes that turn out to have very different probabilities, thus the difference between patterns is not necessarily large.
Conversely, the cross-tabulated probabilities give a new possible way to partition the morphospace, which was the general goal of ch. 7. For each sample and a threshold, a partition of the morphospace in two patterns is achieved by distinguishing the shapes "belonging to the shortest list of shapes whose cumulative probability of having generated the sample is non smaller than the threshold" and the rest of the shapes.
Given a sample size, a partition of the morphospace is given by the groups of shapes that:
1. never belong to the shortest list;
2. belong to one of the shortest lists;
3. belong to 2 of the shortest lists;
nn... belong to all the shortest lists (or if there is a maximum number of list a shape can belong, then such max).
Under certain conditions of regularity (satisfied by the canonical morphospace), a shape share the same list always with the same group of other shapes (or better: if it belongs to the shortest list, then also the others do belong), thus a sort of "natural" clustering of shapes occur if filtered by sampling.
In other words, a small sample can shed light on the whole distribution from where it may have been drawn and, especially, on the pattern that characterise such distribution.