80:20

This is going to be a heck of a lot more relevant than the last one.

Okay then, first of all there's a thing called Gaussian distribution, also known as the normal distribution or the bell curve which works a lot of the time when you look at a range of quantities. It looks like this:

You're supposed to label axes, but on this occasion I haven't because it applies to so many things. One straightforward thing it applies to is heights. If you take a hundred like-gendered adults at random and measure their heights, then arrange them in terms of, say, people who are 5'6", 5'7" and so on, you will find they're distributed something like this. It also applies to IQ. There's a formula to make this but I've never used it and can't remember it. If I want to make a normal distribution curve in software, I'd start with a value in the middle of the range and add or subtract one at random, or rather pseudo-random, a few times until I let it settle at a particular value, then do the same to another value, and add them together, and this would gradually build up a bell curve.

Bell curves come up in manufacturing. When particular components or devices are made, some are better than others - they just are because the world is effectively random. These same components can then be sold at different prices depending on how good they are. This applies, for example, to CPUs and TFT displays. There will inevitably be a few good-quality CPUs which can be sold for a higher price, and a few low-quality CPUs which can be sold for less than average prices, and most CPUs will be in the middle. All of them come off the same production line on the same day, but they will never be quite the same. Whatever factors are involved in influencing their quality will tend to average out, so most of them will be near the middle. A few will be lucky and everything will go right for them, and a few will be unlucky and everything will go wrong. A few more will be slightly more lucky or less lucky and so on. This is similar to a value being bounced up and down by tiny amounts until it's left alone. It's also probably worth noting that the fewer factors are involved, the less likely this distribution is likely to be "normal". Another one is time to failure. A few things will break almost immediately, most will last a bit long and some will go on almost forever.

So that's all fine then, but there's more. Some things are ordered 80:20, roughly. The 80:20 rule operates in business, in mine in fact. In my business, I usually give my customers a product with five ingredients and they tended to use me once a month for various periods of time, and I would also be available for advice on the end of a phone, by Zoom or whatever. I found that one out of five customers provided four fifths of my income. I also found that four-fifths of the ingredients were only used about a fifth of the time and one fifth was used four fifths of the time. This means that a kind of mirage might occur where one might expect to have the other 80% of my customers provide the same amount of business per head as the 20%, thereby increasing my income fivefold. This is not, however, really possible. It's possible to increase profit overall, presumably, but the pattern would remain. It's also necessary to keep products rarely used in stock, because sometimes they will have to be provided. It isn't, however, feasible to cause those products to be used proportionately more. This is the 80:20 rule, also known as the log-normal distribution:

This time I've left the values on the axes to indicate the logarithmic nature of the distribution. Different industries follow the distribution more exactly or less so, so although it can be difficult or impossible to "fix" the general ratio, and therefore a waste of energy and a drain on profits, if your business shows a ratio radically unlike the usual ratio for that kind of business, there may be a problem (or you might be doing unusually well of course. As if!).

This applies to all sorts of other phenomena. For instance, there are relatively few long rivers and tributaries but relatively many short ones, and relatively few large islands but many small ones. Crucially, this also applies to language, and it's vital to remember this in particular.

English has about a million words overall, although many of them are just different spellings of the same words, such as "ynpossybul" for "impossible". Around eighty-eight of them comprise over half of all English text. 12.7% of letters used in English text are the letter "E", followed by "T" at 9.1%. Twenty percent of twenty-six, i.e. the number of letters in English, is around 5.2. Adding up the frequency of the five most common letters in English text brings the figure to 44% and six takes it over fifty percent. These are, incidentally, E, T, A, O, I and N. The same kind of thing applies to lengths of words. Although other written languages will have different sounds, letters, words and so forth in the same positions, this pattern of distribution will always be close.

This, incidentally, is an issue with glossolalia. Some Christian groups believe in speaking in tongues, that some people are inspired to speak in languages they don't know. When this is recorded, the distribution is not close to log normal. This is unlike, for instance, whale sounds, which do obey this law, and consequently it's sometimes claimed that they're "angelic languages", but if that's so, angelic languages don't obey this almost universal law. It's more widely accepted that it's simply babbling, at least in the sense of how it sounds - it's more complicated than that but for now I'll leave it at that.

Now suppose you want to encode a piece of language by replacing letters with other letters. A simple way to do this is to count the same number of places forward in the alphabet and to wrap around when you reach the end. This would be very easy to decode. Another way would be to rearrange the alphabet randomly and substitute the letters that way. Both of these methods are badly flawed because the cipher can be cracked very easily. For instance, the common one letter words in English are "I" and "a". A single letter word will represent either one or the other, and immediately more than a seventh of the letters are known. The most common letter is likely to represent E. That adds up to almost a quarter. Most of the three-letter words ending in E will be "the", so that's two more, and so on. It's dead easy to break the code.

There are ways around it to some extent. For instance, using a one-time pad known only to the sender and the recipient would allow the letters to be transposed by different numbers of places each time in a pattern known only to them. That pad would have to have been securely communicated between the parties before the messages began to be sent. That's a sensible arrangement of course, but steps can also be taken to introduce noise into the signal which the recipient knows how to extract. The idea is to obscure the log-normal distribution of different parts of the message, so for example in English the most common six letters could be interspersed with the least common, which would level out the distribution and frequency. Likewise with word length.

However, even if all this were done, a conspicuous message would still arouse suspicion. It helps if nobody realises a message is being sent. One way of doing this is steganography. This is where a message is concealed in another message. Cicada 3301 did this, in one instance quite crudely by simply including a line of text in a JPG, but there are more subtle ways of doing it. It's been suggested that the Voynich Manuscript does this by modifying the shapes of the characters, and it could be done in various ways with text, such as using Morse code represented by dots on I's and J's and dashes as crosses on T's.

So that's all I think I need to say for now. I could also mention the "scissors" language but that's for another time really.

Search This Blog

horage

80:20

Comments

Post a Comment

Popular posts from this blog

Brainwashing Helmets

It's Always Sunny Somewhere In There