SlideShare a Scribd company logo
Statistical Properties of the Entropy Function of a Random

                                             Partition

                                          Anna Movsheva



Contents

1 Introduction                                                                                            1

  1.1   Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1

  1.2   Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        3

  1.3   Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      4

        1.3.1   General properties of θ(p, x)    . . . . . . . . . . . . . . . . . . . . . . . . . . .    5

        1.3.2   Functions µ(p) and σ(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       7

        1.3.3   The Generating Function of Momenta . . . . . . . . . . . . . . . . . . . . . . 10

        1.3.4   Discussion of Conjecture 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

  1.4   Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


2 Methods                                                                                                14


3 Results                                                                                                14

  3.1   Computation of βr and γr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

  3.2   Bell Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


4 Discussion and Conclusion                                                                              19
Abstract

   It is well known that living organisms are open self-organizing thermodynamic systems with
a low entropy. An estimate for the number of subsystems with low entropy would give a rough
guess about the number of self-organizing subsystems that exist in a closed system S. I study
the mathematical properties of a model in which a finite set X with a probability distribution
                                                                                         l
{px |x ∈ X} encodes a set of states of the system S. A partition of the set X =          i=1   Yi , in this
model represents a subsystem with the set of probabilities {p(Yi ) =        x∈Yi   px }. In this paper I
study the entropy function H(p, Y ) = −    i   p(Yi ) ln p(Yi ) of a random partition Y . In particular
I study the counting function Θ(p, x) = #{Y |H(p, Y ) ≤ x}. Using computer simulations, I give
evidences that the normalized function θ(p, x) = Θ(p, x)/Θ(H(p, X)) asymptotically can be
                                                                     x
approximated by the cumulative Gauss distribution 1/ 2πσ(p)          −∞
                                                                          exp(−(t − µ(p))2 /2σ(p))dt.
I state my findings in a form of falsifiable conjectures some of which I partly prove. The
asymptotics explain a strong correlation between µ(p), the average entropy of a random partition
of X, and the entropy H(p, X). Since the quantity µ(p) is usually available in practice I can
give an estimate for H(p, X) when it is not directly computable.
Movsheva, Anna


1     Introduction

1.1    Background

One of the main problems of theoretical biology and theoretical physics is to reconcile the theory of

evolution with statistical mechanics and thermodynamics. It was Ilya Prigogine who was the first

who made the fundamental contributions to the solution of this problem. He advocated that living

organisms are open self-organizing thermodynamic systems with a low entropy. These open systems

are part of a large closed system S. Since I am interested in open self-organizing thermodynamic

systems it is important to know the number of subsystems within S that have low entropy. In my

work I studied this question from the mathematical point of view. In my simplified approach the

configuration space of S was a finite set X with a probability distribution. In my interpretation a

subsystem was a partition of X. In my work I studied a function that, for a given x, counted the

number of partitions of X who’s entropy did not exceed x. My approach is rather general because

any configuration space can be approximated by a sufficiently large but a finite set.

    The controversy between classical biology and physics has a long history. It revolves around

a paradox that physical processes are reversible and biological are not. Boltzmann in the process

of working on this dilemma laid the foundation of statistical physics. He put forward the notion

of entropy which characterizes the degree of disorder in a statistical system. The second law

of thermodynamics in the formulation of Boltzmann states that the entropy of a closed system

cannot decrease, which makes the time in a statistical system irreversible. The solution of the

problems of irreversibility of time did not completely eliminate the contradiction. The second law

of thermodynamics seems to forbid the long term existence of the organized system, such as living

organisms. Schr¨dinger in his book [19] (Chapter 6) pointed out that the entropy can go down in an
               o

open system, that is a system that can exchange mass and energy with the surroundings. Prigogine

in his groundbreaking works [15, 14, 16] showed that the self-organization (decrease of entropy) can

be achieved dynamically. His discovery layed down the foundation of non-equilibrium statistical

mechanics. The most interesting self-organizing systems exist far away from the equilibrium and

are non static by their nature.

    There is a vast literature on self-organization (see e.g.[16, 10, 9, 12] and the references therein).


                                                   1
Movsheva, Anna


Current research is focused on the detailed studying of individual examples of self-organization and

is very successful (see e.g.[3]). In this work I changed the perspective. My motivating problem was

rather general - to estimate the total number of self-organizing subsystems in a thermodynami-

cally closed system. Self-organizing subsystems are the most interesting specimen of the class of

subsystems with a low entropy. This motivates my interest in estimating the number of subsys-

tems with a low entropy. Knowing this number the number of self-organizing subsystems can be

assessed. A problem given in such generalities looks very hard so I made a series of simplifications

that let me progress in this direction. Ashby in [1] argued that any system S can be thought of

as a “machine”. His idea is that the configuration space of S can be approximated by a set or

an alphabet X and the dynamics is given by the transition rule TX : X → X. A homomorphism

between machines S = (X, TX ) and Q = (Z, TZ ) is a map ψ : X → Z such that ψTX = TZ ψ.

Homomorphisms are useful in analysis of complicated systems. (See [1] for details) A submachine,

according to [1], is a subset X ⊂ X that is invariant in respect to TX . I never used this definition

in this paper. In my definition a submachine is a homomorphic image ψ : (X, TX ) → (Z, TZ ). For

example, if a machine (X, T ) consists of N non-interactive sub machines (X1 , T1 ), . . . , (XN , TN )

then X = X1 × · · · × XN , T = T1 × · · · × TN . Projectors ψi (xi , . . . , xN ) = xi are homomorphisms of

machines. This reflects the fact that the configuration space of a union of non interacting systems

is a product (not a union) of the configuration spaces of the components.

Definition 1.1. A collection of subsets Y = {Yz |z ∈ Z} , such that Yz ∩ Yz = ∅, z = z and

  z∈Z   Yz = X is a partition of a finite set X, r = #X. Let ki to be the cardinality #Yz . In this are

I shall use the notation X =      z   Yz

   Any homomorphism ψ : (X, TX ) → (Z, TZ ) defines a partition of X with Yz equal to {x ∈

X|ψ(x) = z}. In fact up to relabeling the elements of Z the homomorphism is the same as a

partition. This also explains why I am interested in the counting of the partitions. Ashby in

[1] argued that a machine (X, T ) is a limiting case of a more realistic Markov process, in which
                                                                                     ˜
deterministic transition rules x → T (x) get replaced by random transition rules x → T (x). The

dynamics of the process is completely determined by the probabilities {px ,x |x, x ∈ X} to pass from

the state x to the state x and the initial probability distribution {px |x ∈ X}. Markov processes

have been studies in the theory of information developed originally in [20].

                                                    2
Movsheva, Anna


   Yet there is still another way to interpret quantities that I would like to compute. A submachine

can be also be interpreted as a scientific device. This can be understood in the example of a

hurricane on Jupiter [2]. You can analyze the hurricane in a multitude of ways: visually through

the lenses of a telescope, by recording the fluctuations of winds with a probe, by capturing the

fluctuations of the magnetic field around the hurricane. Every method of analysis (device) gives

a statistical data that yields in turn the respective entropy. If (X, p) is a space of states of the

hurricane, then ψ : X → Z is a function, whose set of values is the set of readings of the scientific

device. It automatically leads to a partition of X as it was explained above. The list of known

scientific methods in planetary science is enormous [13], and any new additional method contributes

something to the knowledge. Yet, the full understanding of the subject would be only possible if I

used all possible methods (ψs). This, however, is not going to happen in planetary science in the

near future. The reason is that the set of states X of the Jupiter atmosphere is colossal, which

makes the set of all conceivable methods of its study (devices) even bigger.

   Still, imagine that all the mentioned troubles are nonexistent. It would be interesting to count

the number of scientific devices that yield statistical data about the hurricane with entropy no

greater than a given value. It would be also interesting to know their the average entropy. This is

a dream. I did just that in my oversimplified model.


1.2   Research Problem

In the following, the set X will be {1, . . . , r}. Let p be a probability distribution on X, that is
                                              r
a collection of numbers pi ≥ 0 such that      i=1 pi   = 1. The array p = (p1 , . . . , pr ) is said to be a

probability vector. The probability of Yi in the partition X =         Yi is


      p(Yi ) =          pj .
                 j∈Yi


                                                                                               l
Definition 1.2. Entropy of a partition Y , H(p, Y ) is calculated by the expression −           i=0 p(Yi ) ln p(Yi ).

In this definition the function x ln x is extended to x = 0 by continuity 0 ln 0 = 0.

                                                                  r
   Here are some examples of entropies: H(p, Ymax ) = −           i=1 pi ln pi   for Ymax = {{1}, . . . , {r}},

H(p, Ymin ) = 0 for Ymin = {{1, . . . , r}}. One of the properties of the entropy function (see [6]) is

                                                  3
Movsheva, Anna


that


       H(p, Ymin ) ≤ H(p, Y ) ≤ H(p, Ymax ) for any Y ∈ Pr                                         (1)


   It is clear from the previous discussion that Θ(p, x) = #{Y ∈ Pr |H(p, Y ) ≤ x} is identical to

the function defined in the abstract.

   The Bell number Br ([22],[17]) is the cardinality of Pr . The value Θ(p, H(p, Ymax )) = Θ(p, H(p, id))

thanks to (1) coincides with Br . From this I conclude that

                     #{Y ∈ Pr |H(p, Y ) ≤ x}
       θ(p, x) =
                               Br

is the function defined in the abstract.

   My main goal is to find a simple approximation to θ(p, x).


1.3     Hypothesis

In this section I will formulate the conjectures that I obtained with Computing Software Mathe-

matica [11].

Remark 1.3. I equipped the set Pr with the probability distribution P such that P(Y ) for Y ∈ Pr

is equal to 1/Br . The value of the function θ(p, x) is the probability that a random partition Y has

the entropy ≤ x. This explains the adjective “random” in the title of the paper.

   In order to state the main result I will need to set notation:

                                      k

       p[k] = (p1 , . . . , pr , 0, . . . , 0)                                                     (2)


where p = (p1 , . . . , pr ) is the probability vector. From the set of momenta of the entropy of a

random partition

                             1
       E(H l (p, Y )) =                   H l (p, Y )                                              (3)
                             Br
                                  Y ∈Pr


I will use the first two to define the average µ(p) = E(H(p, Y )) and the standard deviation σ(p) =

                                                        4
Movsheva, Anna


  E(H(p, Y )2 ) − E(H(p, Y ))2 .

Conjecture 1.4. Let p be a probability distribution on {1, . . . , r}. Then

                                               ∞               (x−µ)2
                                    1
        lim E(H l (p[k], Y )) − √                     xl e −     2σ       dx = 0
        k→∞                         2πσ        −∞


with µ = µ(p[k]), σ = σ(p[k])and for any integer l ≥ 0.

   Practically this means that the cumulative normal distribution

                                 x           (x−µ)2
                           1
        Erf(x, µ, σ) = √              e−       2σ
                           2πσ   −∞


with µ = µ(p[k]), σ = σ(p[k]) makes a good approximation to θ(p[k], x) for large k.

   The initial study of the function θ(p, x) has been done with the help of Mathematica. The

software can effectively compute the quantities associated with set X whose cardinality does not

exceed ten.


1.3.1     General properties of θ(p, x)

The plots of some typical graphs are presented in Figure 1.1. These were done with a help of

Mathematica.
                                       1.2



                                       1.0



                                       0.8



                                       0.6



                                       0.4



                                       0.2



                                         0.0      0.5      1.0      1.5      2.0   2.5   3.0




                                     Figure 1.1: Graphs of θ(p, x), θ(q, x).


   The continuous line on the graph corresponds to θ(p, x) with


        p = (0.082, 0.244, 0.221, 0.093, 0.052, 0.094, 0.079, 0.130)


The step function corresponds to q = ( 8 , . . . , 1 ). Large steps are common for θ(q, x) when q has
                                       1
                                                   8


                                                                    5
Movsheva, Anna


symmetries. A symmetry of q is a permutation τ of X such that qτ (x) = qx for all x ∈ X. Indeed,

if I take a symmetry and act it upon a partition, I get another partition with the same entropy.

This way I can produce many partitions with equal entropies. Hence, I get high steps in the graph.

   The effect of of the operation p → p[1] (2) on θ(p, x) is surprising. Here are the typical graphs:
                                   1.2



                                   1.0



                                   0.8



                                   0.6



                                   0.4



                                   0.2




                                     0.0   0.5   1.0       1.5   2.0   2.5




   Figure 1.2: Graphs of θ(p, x), θ(p[1], x), θ(p[2], x) for some randomly chosen p = (p1 , . . . , p6 ).


   The reader can see that the graphs have the same bending patterns. Aslo graphs lie one over

the other. I wanted to put forth a conjecture that passed multiple numerical tests.

Conjecture 1.5. For any p I have


     θ(p, x) ≥ θ(p[1], x)


   A procedure that plots θ(p, x) is hungry for computer memory. This is why it is worthwhile to

find a function that makes a good approximation. I have already mentioned in the introduction

that Erf(x, µ(p), σ(p)) approximates θ(p, x) well. For example, if


     p = (0.138, 0.124, 0.042, 0.106, 0.081, 0.131, 0.088, 0.138, 0.154),                                   (4)


the picture below indicates a good agreement of graphs.




                                                       6
Movsheva, Anna

                                       1.2



                                       1.0



                                       0.8



                                       0.6



                                       0.4



                                       0.2



                                         0.0    0.5    1.0   1.5   2.0   2.5   3.0




                    Figure 1.3: Erf(x, µ(p), σ(p)) (red) vs θ(p, x)(blue), with p as in (4).


     The reader will more precise relations between Erf and θ in the following sections.


1.3.2        Functions µ(p) and σ(p)

The good agreement of graphs Erf(x, µ(p), σ(p)) and θ(p, x) raises a question of a detailed analysis

of the functions µ(p) and σ(p). It turns out that a more manageable quantities than µ(p) are

                                                      H(p, Ymax )
         β(p) = H(p, Ymax ) − µ(p),          γ(p) =                                                               (5)
                                                        µ(p)

The unequally (1) implies that µ(p) ≤ H(p, Ymax ) and β(p) ≥ 0, γ(p) ≥ 1. Evaluation of the

denominator of γ(p) with formula (3) requires intensive computing. On my slow machine I used

the Monte-Carlo approximation [8]

                     k
                1
         µ(p) ∼           H(p, Y i )
                k
                    i=1


where Y i are independent random partitions. Below is the graph of β(p1 , p2 , p3 ) γ(p1 , p2 , p3 ) plotted

in Mathematica. The reader can distinctly see one maximum in the center corresponding to p =

( 3 , 1 , 1 ).
  1
      3 3




      Figure 1.4: The plot of β(p1 , p2 , 1 − p1 − p2 )             Figure 1.5: The plot of γ(p1 , p2 , 1 − p1 − p2 )


     A closer look at the plot shows that γ(p1 , p2 , p3 ) is not a concave function.


                                                              7
Movsheva, Anna


      In the following hr stands for the probability vector ( 1 , . . . , 1 ).
                                                              r           r

      I came up with a conjecture, which has been numerically tested for r ≤ 9:

Conjecture 1.6. The function γ(p1 , . . . , pr ) can be extended by continuity to all distributions p.

In this bigger domain it satisfies


                              def
        1 ≤ γ(p) ≤ γ(hr ) = γr .                                                                               (6)


Likewise the function β satisfies


                              def
        0 ≤ β(p) ≤ β(hr ) = βr .                                                                               (7)


      The reader should consult sections below on the alternative ways of computing of βr and γr .

      The following table contains an initial segment of the sequence of {γr }.


                                             Table 1: Values of γr .
 r      2        3        4         5       6       7      8         9     ...        100   ...      1000
 γr     2    1.826    1.739     1.691   1.659 1.635 1.617 1.602            ...      1.426   ...     1.341


      I see that it is a decreasing sequence. Extensive computer tests have lead me the following

conjecture.

Conjecture 1.7. The sequence {γr } satisfies γr ≥ γr+1 and lim γr = 1.
                                                                        r→∞


      The limit statement is proved in Proposition 3.6.

Corollary 1.8. lim γ(p[t]) = 1.
                     t→∞


Proof. From Conjecture 1.6 I conclude that 1 ≤ γ(p[t]) ≤ γr+t . Since lim γr+t = 1 by Conjecture
                                                                                     t→∞
1.7, lim γ(p[t]) = 1.
      t→∞




                                             Table 2: Values of βr .

                     r            6            7           8           9      ...       100       ...
                     βr    0.711731     0.756053    0.793492    0.825835      ...    1.3943       ...



                                                         8
Movsheva, Anna


Conjecture 1.9. The sequence {βr } satisfies βr ≤ βr+1 and lim βr = ∞.
                                                                      r→∞


    The situation with the standard deviation σ(p) is a bit more complicated. Here is a graph of

σ(p1 , p2 , p3 ).




  Figure 1.6: Three-dimensional view of the graph of standard deviation σ(p1 , p2 , p3 ) for θ(p, x).


    The reader can clearly see four local maxima. The function σ(p1 , p2 , p3 ) is symmetric. The

maxima correspond to the points ( 1 , 1 , 3 ) and permutations of ( 1 , 1 , 0). This lead me to think
                                  3 3
                                          1
                                                                    2 2

that local maxima of σ(p1 , . . . , pr ) are permutations of qk,r = hk [r − k], k ≤ r. I tabulated the

values of σ(qk,r ) for small k and r in the table below.


                                          Table 3: Values of σ(qk,r ).

                    kr        3          4         5          6          7         8        9
                    2     0.3396     0.3268     0.314     0.3026     0.2924   0.2832     0.275
                    3       0.35   0.3309     0.3173    0.3074     0.2992     0.292     0.286
                    4     -          0.3254     0.309      0.298       0.29     0.283    0.278
                    5     -        -            0.302      0.289       0.28     0.273    0.267
                    6     -        -          -            0.283      0.272     0.265    0.258
                    7     -        -          -         -             0.267     0.258    0.251
                    8     -        -          -         -          -            0.254    0.246
                    9     -        -          -         -          -          -          0.242



    The reader can see that the third row in bold has the largest values of each column. It is not

hard to see analytically that qk,r is a critical point of σ. My computer experiments lead me to the

following conjecture:

Conjecture 1.10. The function σ(p) has a global maximum at q3,r .




                                                        9
Movsheva, Anna


1.3.3      The Generating Function of Momenta

In order to test Conjecture 1.4 I need to have an effective way of computing E(H l (p[k], Y )) for

large values of k. In this section I present my computations of E(H l (p[k], Y )) for small r, which

lead me to a conjectural formula for E(H l (p[k], Y )).

    The factorial generating function of powers of entropy can be written compactly this way:

                                                                                 t
                                                          l
                         ∞
                               H(p, Y )t st
                                                ∞     −   i=0 p(Yi ) ln p(Yi )       st
         G(p, Y, s) =                       =                                             =
                                   t!                               t!                                   (8)
                        t=0                     t=0

          = Πl p(Yi )−p(Yi )s
             i=1


The function G(p, Y, s) can be extended from Pr to Pr+1 in the following way. I extend the r-

dimensional probability vector p to r + 1-dimensional vector p by adding a zero coordinate. Any

partition Y = {Y1 , . . . , Yl } defines a partition Y = {Y1 , . . . , Yl , {r + 1}}. Note that G(p, Y, s) =

G(p , Y , s).

    The following generating function, after normalization, encodes all the moments of the random

partition:


         J(p, s) =           G(p, Y, s)   J(p, s)/Br =          E(H l (p, Y ))sl /l!                     (9)
                     Y ∈Pr                                l≥0


I want to explore the effect of substitution p → p[k] on J(p, s).

    I use the following notations:


         At (p, s) = J(p[t], s).


Here are the results of my computer experimentations. A set of two non-zero p extended by t zeros

yields


         At (p1 , p2 , −s) = Bt+1 + (Bt+2 − Bt+1 )pp1 s pp2 s
                                                   1     2                                              (10)




                                                            10
Movsheva, Anna


The next is for 3 non-zero p extended by zeroes.


       At (p1 , p2 , p3 , −s) = Bt+1 + (Bt+2 − Bt+1 )×

       × (p1 + p2 )s(p1 +p2 ) pp3 s + (p1 + p3 )s(p1 +p3 ) pp2 s + (p2 + p3 )s(p2 +p3 ) pp1 s
                               3                            2                            1
                                                                                                             (11)

       + (Bt+3 − 3Bt+2 + 2Bt+1 )pp1 s pp2 s pp3 s
                                 1     2     3


I found At (p, s) for probability vector p with five or less coordinates. In order to generalize the re-

sults of my computation I have to fix some notations. With the notation deg Y = deg{Y1 , . . . , Yl } =
                 def
l I set J l (p, s) =       deg Y =l   G(p, Y, s). The function

                       k
      At (p, s) =          L(l, t)J l (p, s)                                                                 (12)
                    l=1


where L(l, t) are some coefficients. For example, in the last line of formula (11) the coefficient

L(3, t) is Bt+3 − 3Bt+2 + 2Bt+1 and the function J 3 (p, s) is pp1 s pp2 s pp3 s . The reader can see that
                                                                1     2     3

the coefficients of J l (p, s) in the formulae (10) and (11) coincide. The coefficients of the Bell

numbers in the formulae for L(l, t):


       Bt+1

       Bt+2 − Bt+1

       Bt+3 − 3Bt+2 + 2Bt+1

       Bt+4 − 6Bt+3 + 11Bt+2 − 6Bt+1

       Bt+5 − 10Bt+4 + 35Bt+3 − 50Bt+2 + 24Bt+1


form a triangle. I took these constants 1, 1, −1, 1, −3, 2, 1, −6, 11, −6 and entered them into the

Google search window. The result of the search lead me to the sequence A094638, Stirling numbers

of the first kind, in the Online Encyclopedia of Integer Sequences (OEIS [21]).

                                                                                                 n
Definition 1.11. The unsigned Stirling numbers of the first kind are denoted by                    k   . They count

the number of permutations of n elements with k disjoint cycles [22].




                                                           11
Movsheva, Anna



                                        Table 4: Values of the function L

                                      lt    1        2       3         4              5       ...
                                        1    2        5     15         52            203       ...
                                        2    3      10      37       151             674       ...
                                        3    4      17      77       372           1915        ...
                                        4    5      26    141        799           4736        ...
                                        5    6      37    235      1540          10427         ...
                                      ...   ...     ...   ...      ...           ...           ...



   The rows of this table are sequences A000110, A138378, A005494, A045379. OEIS provided me

with the factorial generating function for these sequences:

Conjecture 1.12.

                    l         l                           l
        L(l, t) =     Bt+l −     Bt+l−1 + · · · + (−1)l+1   Bt+1                                               (13)
                    l        l−1                          1




        ∞
              L(l, t)z t        z
                         = elz+e −1                                                                            (14)
                 t!
        t=0


The identity (12) holds for all values of t.


1.3.4     Discussion of Conjecture 1.4

Formula (12) simplify computation of E(H l (p[k], Y )). Here is a sample computation of

                                                                  ∞             (x−µ(p[k]))2
                                                    1                       −
        D(p, l, k) = E(H l (p[k], Y )) −                               xl e       2σ(p[k])     dx
                                                  2πσ(p[k])       −∞


   for p = (0.4196, 0.1647, 0.4156)




                                                              12
Movsheva, Anna



                               Table 5: Values of the function D(l, k)

                   lk         0       100       200       300       400   500
                     3   -0.0166   -0.0077   -0.0048   -0.0036   -0.0029   -0.0024
                     4   -0.0474   -0.0273   -0.0173   -0.0129   -0.0104   -0.0088
                     5   -0.0884   -0.0617   -0.0393   -0.0294   -0.0237   -0.0200
                     6   -0.1467   -0.1142   -0.0726   -0.0543   -0.0438   -0.0369



   The reader can see that the functions k → D(p, l, k) have a minimum for some k after which

they increase toward zero.


1.4   Significance

There are multitudes of possible devices that can be used for study of a remote system. While

some devices will convey a lot of information, some device will be inadequate. Surprisingly, the

majority of the devices (see Conjectures 1.6, 1.7, and 1.9) will measure the entropy very close to

the actual entropy of the system. All that is asked is that the device satisfies condition


      the onto map ψ : X → Z.                                                                  (15)


Z is the set of readings of the device.

   The cumulative Gauss distribution [4] makes a good approximation to θ(p, x). The only pa-

rameters that have to be known are the average µ and the standard deviation σ. This give an

effective way of making estimates of θ(p, x). The precise meaning of the estimates can be found in

Conjecture 1.4.

   My work offers a theoretical advance in the study of large complex systems through entropy

analysis. The potential applications will be in sciences that deal with complex systems, like econ-

omy, genetics, biology, paleontology, and psychology. My theory explains some hidden relations

between entropies of observed processes in a system. Also my theory can give an insight about the

object of study from incomplete information. This is an important problem to solve and a valuable

contribution to science according to my mentor who is an expert in this field.




                                                 13
Movsheva, Anna


2       Methods

All of the conjectures were gotten with the help of Mathematica. My main theoretical technical

tool is the theory of generating functions [22].

Definition 2.1. Let ak be a sequence of numbers where k ≥ 0. The generating function correspond

to ak is a formal power series                     k
                                           k≥0 ak t .


    My knowledge of Stirling numbers (see Definition 1.11) also comes from [22]. I also used Jensen

Inequality (Theorem 3.4)[6].



3       Results

3.1     Computation of βr and γr

The main result of this section is the explicit formulae for βr (see formula (7)) and γr (see formula

(6)):

               ω(r, 1)
        βr =
                rBr
                   1                                                                                                (16)
        γr =         ω(r,1)
               1−     rBr


where

                       r−1
                               Bi ln(r − i)
        ω(r, 1) = r!                                                                                                (17)
                              i!(r − i − 1)!
                       i=0


                                                                        ki         #Yi
    I set some notations. The probability of Yi is                      r      =    r    and the entropy of Y is H(Y ) =
                      l   ki
H(hr , Y ) = −        i=1 r    ln ki . After some simplifications H(Y ) becomes
                                  r


                      1
        H(Y ) = ln r − λ(Y )                                                                                        (18)
                      r

where

                                                               l
                                      k k             k
        λ(Y ) = λ(k1 . . . kl ) = ln k1 1 k2 2 . . . kl l =         ki ln ki                                        (19)
                                                              i=1


                                                                   14
Movsheva, Anna


The average entropy is


                                                Y ∈Pr      λ(Y )
     E(H(hr , Y )) = ln r −                                                                                         (20)
                                                     rBr

I am interested in calculating the sums:


     ω(r, q) =                  λ(Y )q , q ≥ 0                                                                      (21)
                        Y ∈Pr


The generating function of λ(Y )q with factorial is

                         ∞
                                λ(Y )k sk
     Λ(Y, s) =                            = k1 1 s · · · kl l s
                                             k            k
                                                                                                                    (22)
                                   k!
                        k=0


I will compute the generating function with factorials of the quantities


     Λ(r, s) =                  Λ(Y, s)                                                                             (23)
                        Y ∈Pr


Theorem 3.1.

         ∞
             Λ(r, s)tr
                       = eF (s,t)                                                                                   (24)
                r!
      r=0


                              ∞ rrs tr
where F (s, t) =              r=1 r! .


Proof.

                         ∞                      ∞            ∞                 l
         F (s,t)              F (s, t)l               1            k ks tk
         e         =                    =                                          =
                                 l!                   l!             k!
                        l=0                    l=0       k=1
             ∞           ∞                    ∞                       ∞
                   1            k1 1 s tk1
                                 k                 k2 s k2
                                                 k2 t                        kl l s tkl
                                                                              k
         =                                                     ···
                   l!             k1 !                k2 !                     kl !
             l=0        k1 =1                k2 =1                   kl =1
              ∞
                                                                                                                    (25)
                   1                      k1 1 s k2 2 s · · · kl l s tk1 +k2 +···+kl
                                           k      k            k
         =
                   l!                                 k1 !k2 ! · · · kl !
             l=0        k1 ≥1,...,kl ≥1
              ∞
                   1                               l!        k1 1 s k2 2 s · · · kl l s tk1 +k2 +···+kl
                                                              k      k            k
         =
                   l!                         c1 !c2 ! . . .              k1 !k2 ! · · · kl !
             l=0        1≤k1 ≤k2 ≤···≤kl




                                                                               15
Movsheva, Anna


Coefficient ci is the number of ks that are equal to i. After some obvious simplifications the formula

above becomes:

                   ∞                                                                               ks
                         1 r                                         r!       k1 1 s k2 2 s · · · kl l
                                                                               k      k
      eF (s,t) =            t                                                                                                  (26)
                         r!                                     c1 !c2 ! . . . k1 !k2 ! · · · kl !
                   r=0          k1 ≤k2 ≤···≤kl ,k1 +···+kl =r


    Each partition Y determines a set of numbers, ki = #Yi . I will refer to k1 , . . . , kl as to a

portrait of {Y1 , . . . , Yl }. Let me fixate one collection of numbers, k1 , . . . , kl . I can always assume

that the sequence is not decreasing. Let me count the number of partitions with the given portrait:
                                                                                                         (k1 +k2 +···+kl )!
k1 ≤ · · · ≤ kl . If the subsets were ordered the number of partitions would equal to                       k1 !k2 !···kl ! .      In
                                                                                                            (k1 +k2 +···+kl )!
my case the subsets are unordered and the number of such unordered partitions is                           k1 !k2 !···kl !c1 !c2 !... ,

where ci is the number of subsets with cardinality i. The function Λ(Y, s) depends only on the

portrait of Y . From this I conclude that

                                             (k1 + k2 + · · · + kl )!k1 1 s k2 2 s . . . kl l s
                                                                         k      k         k
              Λ(Y, s) =                                                                                                        (27)
                                                    k1 !k2 ! · · · kl !c1 !c2 ! . . .
      Y ∈Pr                k1 ≤k2 ≤···≤kl


which yields the proof.

    Note that upon the substitution s = 0 the formula (24) becomes a classical generating function



            Bk t k     t
                   = ee −1                                                                                                     (28)
             k!
      k≥0


(see [22]).
                                                                                      ∞ tr
    My knowledge lets me find the generating function                                  r=0 r! ω(r, 1).


Proposition 3.2.

       ∞                              ∞
            tr ω(r, 1)     t                tk ln k
                       = ee −1                                                                                                 (29)
                r!                         (k − 1)!
      r=0                            k=1




                                                                     16
Movsheva, Anna


Proof. Using equations 22, 23, and 21 I prove that

        ∞                             ∞                            ∞
              ∂ Λ(r, s)tr                   tr                           ω(r, 1)tr
                          |s=0 =                         λ(Y ) =                   .
              ∂s   r!                       r!                              r!
        r=0                          r=0         Y ∈Pr             r=0


                                                             ∞ ∂ Λ(r,s)tr
I find alternatively the partial derivative                   r=0 ∂s r!    |s=0         with the chain rule applied to right-
                           ∂ F (s,t)                         ∂
hand-side of (24):         ∂s [e     ]|s=0        = eF (s,t) ∂s [F (s, t)]|s=0 . Note that F (s, t)|s=0 = et − 1 and
∂                         ∞ tk k ln k
∂s [F (s, t)]|s=0   =     k=1   k! .        From this I infer that

                                      ∞
        ∂ F (s,t)           t                tk k ln k
           [e     ]|s=0 = ee −1                        .
        ∂s                                       k!
                                     k=1




                                                                                                          t −1       ∞ Bn tn
    I want to find am explicit formula for ω(r, 1). To my advantage I know that ee                                =   n=0 n! ,

where Bn is the Bell number or the number of unordered partitions that could be made my of a

set of n elements [22]. To find ω(r, 1) I will expand equation (29).

         ∞                    ∞              ∞
              tr ω(r, 1)           Bn t n          ln ktk
                         =
                  r!                n!            (k − 1)!
        r=0                  n=0            k=1                                                                            (30)
           B0 ln 2 2   B1 ln 2 B0 ln 3 3     B2 ln 2 B1 ln 3 B0 ln 4 4
         =        t +(        +       )t + (        +       +       )t + · · ·
           0! 1!       1! 1!    0! 2!        2! 1!    1! 2!   0! 3!

Since equal power series have equal Taylor coefficients, I conclude that formula (29) is valid. For-

mulae (16) follow (20), (5), (6), and (7).

    Using the first and second derivatives of equation 11 at s = 0 I find σ(q3,r ):

                         4Bt+1 ln 22 8Bt+1 Bt+2 ln 22 4Bt+2 ln 22 4Bt+1 ln 22
                           2                            2
                     −       2      +       2        −    2      −            +
                           Bt+3           Bt+3          Bt+3        3Bt+3
      σ(q3,r ) =                                                                                                       1   (31)
                    4Bt+2 ln 22 4Bt+1 ln 2 ln 3 4Bt+1 Bt+2 ln 2 ln 3 Bt+1 ln 32 Bt+1 ln 32
                                  2                                   2                                                2
                               +      2        −        2           −    2     +
                      3Bt+3         Bt+3              Bt+3             Bt+3       Bt+3

3.2     Bell Trials

I introduce a sequence of numbers

                  (r − 1)!Bi
       pi =                      , i = 0, . . . , r − 1                                                                    (32)
               Br i!(r − i − 1)!

                                                                   17
Movsheva, Anna


                                                                                r−1
The sequence p = (p0 , . . . , pr−1 ) satisfies pi ≥ 0 and                       i=0 pi     = 1. It follows from the recursion
                r−1 (r−1)!Bi
formula         i=0 i!(r−i−1)!      = Br [22]. I refer to a random variable ξ with this probability distribution

as to Bell trials. Note that the average of ln(r − ξ) is equal to ω(r, q)/rBr .

                                           r−1                        (r−1)Br−1 +Br
Proposition 3.3.                 1.        i=0 (r   − i)pi = µr−1 =        Br

           r−1                        (r−2)(r−1)Br−2 +3(r−1)Br−1 +Br
   2.      i=0 (r    − i)2 pi =                     Br

                                                                                           r  r!Bi xr−i+1
Proof. I will compute the generating function of Sr (x) =                                  i=0 i!(r−i)!     instead. Note that

Sr (x) |x=1 = Br µr .

 ∞                     ∞            r                                 ∞             ∞
        Sr (x)tr            1              (a + b)!Ba ta xb+1 tb           Ba t a         x(xt)b     t             t
                 =                                               =                               = ee −1 xext = xee −1+xt (33)
           r!               r!                     a!b!                     a!              b!
r=0                   r=0        a+b=r                               a=0            b=0


I factored the generating function into two series, which very conveniently simplified into expo-

nential expressions. Now that I found the simplified expression for the generating function I will

differentiate it

         ∂      t                        t                      t
            [xee −1+xt ]|x=1 = (xt + 1)ee −1+xt |x=1 = (t + 1)ee −1+t                                                         (34)
         ∂x

                                              Bk tk−1                                                           t −1      t −1+t
Note that the function                    k≥1 (k−1)!    (compare it with formula (28)) is equal to ee                  = ee        ,

which implies

           t −1+t          t −1+t               (k − 1)Bk−1 tk−1            Bk tk−1
         tee         + ee             =                          +                                                            (35)
                                                    (k − 1)!                (k − 1)!
                                          k≥2                         k≥1


and the formula for µr−1
                                          r−1
     The second moment                    i=0 (r    − i)2 pi can be computed with the same methodic. Note that the

second moment is equal to Br (x(Sr (x) )) The generating function with factorials of the second

moments is

         ∂              t                                 t                            t
            [x(xt + 1)ee −1+xt ]|x=1 = (t2 x2 + 3tx + 1)ee −1+xt |x=1 = (t2 + 3t + 1)ee −1+t =
         ∂x
                (k − 2)(k − 1)Bk−2 tk−1          3(k − 1)Bk−1 tk−1         Bk tk−1                                            (36)
         =                                +                         +
                        (k − 1)!                      (k − 1)!            (k − 1)!
               k≥3                                        k≥2                              k≥1




                                                                  18
Movsheva, Anna




Theorem 3.4. (Jensen’s Inequality)[6],[18] For any concave function f : R → R and a sequence

qi ∈ R>0 the inequality holds

       r                      r
            f (i)qi ≤ f (         iqi )
      i=1                   i=1


    I want to apply this theorem to the concave function ln x:

Corollary 3.5. There is an inequality with pi as in (32):

       r
                                            (r − 1)Br−1
            ln(r − i)pi < ln 1 +
                                                 Br
      i=1


Proof. Follows from Jensen’s Inequality and Proposition 3.3.

Proposition 3.6. lim γr = 1
                        r→∞

Proof. Corollary 3.5 implies

                   1                            1
      γr =          ω(r,1)
                                  <          “   (r−1)Br−1
                                                           ”                                                                (37)
              1−   rBr ln r
                                           ln 1+    B      r
                                      1−            ln r


It’s easy to see that Br ≥ 2Br−1 since I always have a choice of whether to add r to the same part
                                                 Br−1                1                                                „ 1
as r − 1 or not. This implies that                Br           ≤   2r−1
                                                                        .   From this I conclude that 1 ≤ γr <            (r−1)
                                                                                                                                «
                                                                                                                    ln 1+ r−1
                                                                                                                           2
                                                                                                                 1−      ln r
and lim γr = 1.



4    Discussion and Conclusion

I was not able to prove all the conjectures I made. I have made some steps (Proposition 3.6)

toward the proof of Corollary 1.8 of the Conjecture 1.6, that the ratio of the maximum entropy to

the average entropy is close to one. Another fact I have found was that the difference between the

maximum entropy and the average entropy of partitions slowly grows as #X increases. I conjecture

that the difference has the magnitude ln ln #X. Also I have computed the standard deviations,

formula (31), which is conjecturally the greatest value of σ(p).

                                                                            19
Movsheva, Anna


   My short term goal is to prove these conjectures. The more challenging goal is to add dynamics

to the system being studied and identify self-organizing subsystems among low entropy subsystems.

   I am grateful for the support and training of my mentor, Dr. Rostislav Matveyev, on this

research.



References

 [1] W. R. Ashby. An Introduction To Cybernetics. John Wiley and Sons Inc, 1966.

 [2] R. Beebe. Jupiter the Giant Planet. Smithsonian Books, Washington, 2 edition, 1997.

 [3] C. Bettstetter and C. Gershenson, editors. Self-Organizing Systems, volume 6557 of Lecture

    Notes in Computer Science. Springer, 2011.

 [4] W. Bryc. The Normal Distribution: Characterizations with Applications. Springer-Verlag,

    1995.

                  ¨
 [5] R. Clausius. Uber die w¨rmeleitung gasf¨rmiger k¨rper. Annalen der Physik, 125:353–400,
                            a               o        o

    1865.

 [6] T.M. Cover and J.A. Thomas. Elements of information theory. Wiley, 1991.

 [7] S.R. de Groot and P. Mazur. Non-Equilibrium Thermodynamics. Dover, 1984.

 [8] Z.I. Botev D.P. Kroese, T.Taimre. Handbook of Monte Carlo Methods. John Wiley & Sons,

    New York, 2011.

 [9] H. Haken. Synergetics, Introduction and Advanced Topics,. Springer,, Berlin, 2004.

[10] S.A. Kauffman. The Origins of Order. Oxford University Press, 1993.

[11] Mathematica. www.wolfram.com.

[12] H. Meinhardt. Models of biological pattern formation: from elementary steps to the organi-

    zation of embryonic axes. Curr. Top. Dev. Biol., 81:1–63, 2008.

[13] D. Morrison. Exploring Planetary Worlds. W. H. Freeman, 1994.

                                               20
Movsheva, Anna


[14] I. Prigogine. Non-Equilibrium Statistical Mechanics. Interscience Publishers, 1962.

[15] I. Prigogine. Introduction to Thermodynamics of Irreversible Processes. John Wiley and Sons,

    1968.

[16] I. Prigogine and G. Nicolis. Self-Organization in Nonequilibrium Systems: From Dissipative

    Structures to Order through Fluctuations. John Wiley and Sons, 1977.

[17] G.C. Rota. The number of partitions of a set. American Mathematical Monthly, 71(5):498–504,

    1964.

[18] W. Rudin. Real and Complex Analysis. McGraw-Hill, 1987.

[19] E. Schr¨dinger. What Is Life? Cambridge University Press, 1992.
            o

[20] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal,,

    27:379–423, 1948.

[21] N. J. A. Sloane and other. The on-line encyclopedia of integer sequences oeis.org.

[22] R.P. Stanley. Enumerative combinatorics, volume 1,2. CUP, 1997.




                                                21

More Related Content

PDF
The computational limit_to_quantum_determinism_and_the_black_hole_information...
PDF
Geometrical control theory
PDF
Common Fixed Theorems Using Random Implicit Iterative Schemes
PDF
On the Application of the Fixed Point Theory to the Solution of Systems of Li...
PDF
Application of stochastic lognormal diffusion model with
PDF
International Journal of Mathematics and Statistics Invention (IJMSI)
PDF
Thesis_Eric Eun Seuk Choi
PDF
Invariant Manifolds, Passage through Resonance, Stability and a Computer Assi...
The computational limit_to_quantum_determinism_and_the_black_hole_information...
Geometrical control theory
Common Fixed Theorems Using Random Implicit Iterative Schemes
On the Application of the Fixed Point Theory to the Solution of Systems of Li...
Application of stochastic lognormal diffusion model with
International Journal of Mathematics and Statistics Invention (IJMSI)
Thesis_Eric Eun Seuk Choi
Invariant Manifolds, Passage through Resonance, Stability and a Computer Assi...

What's hot (11)

PDF
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
PDF
A Study of Some Systems of Linear and Nonlinear Partial Differential Equation...
PDF
Reading Birnbaum's (1962) paper, by Li Chenlu
PDF
Two parameter entropy of uncertain variable
PDF
Chapter 4 solving systems of nonlinear equations
PDF
Elzaki transform homotopy perturbation method for solving porous medium equat...
PDF
Elzaki transform homotopy perturbation method for
PDF
Section3 stochastic
PDF
Multiattribute Decision Making
PDF
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
PDF
Synchronizing Chaotic Systems - Karl Dutson
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
A Study of Some Systems of Linear and Nonlinear Partial Differential Equation...
Reading Birnbaum's (1962) paper, by Li Chenlu
Two parameter entropy of uncertain variable
Chapter 4 solving systems of nonlinear equations
Elzaki transform homotopy perturbation method for solving porous medium equat...
Elzaki transform homotopy perturbation method for
Section3 stochastic
Multiattribute Decision Making
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
Synchronizing Chaotic Systems - Karl Dutson
Ad

Viewers also liked (20)

PDF
OOH DIGEST JANUARY
PPTX
Boek Amandine
PPT
Presentación sin título (1)
PPSX
кровельные аксессуары и элементы безопасности
PPSX
LA MAS BELLA HISTORIA.
PPTX
Cybercafe
PPTX
Lesson 1
PPTX
Seminar bahasa_uswatun khasanah
PDF
Guidelines 2015 principais alterações
PDF
Cappuccino & fries
PDF
At home final_lr
PDF
Outdoor Digest May 2013
PDF
E20s London Nov 2014
DOCX
FOLLETO PARA IMPRIMIR DE HALLOWEN
PPT
SEMINAR BAHASA_DHITA CANDRA PUSPITA
ODP
Lesenne wrab 2014 writing movement
PDF
Info class
PDF
6 predictions for 2016
PDF
Estruturas compensatórias de drenagem
OOH DIGEST JANUARY
Boek Amandine
Presentación sin título (1)
кровельные аксессуары и элементы безопасности
LA MAS BELLA HISTORIA.
Cybercafe
Lesson 1
Seminar bahasa_uswatun khasanah
Guidelines 2015 principais alterações
Cappuccino & fries
At home final_lr
Outdoor Digest May 2013
E20s London Nov 2014
FOLLETO PARA IMPRIMIR DE HALLOWEN
SEMINAR BAHASA_DHITA CANDRA PUSPITA
Lesenne wrab 2014 writing movement
Info class
6 predictions for 2016
Estruturas compensatórias de drenagem
Ad

Similar to Partitions and entropies intel third draft (20)

PDF
Algorithmic Thermodynamics
PDF
Thekolmogorov Legacy In Physics 1st Edition Roberto Livi Stefano Ruffo
PDF
Thermal and statistical physics h. gould, j. tobochnik-1
PDF
An introduction to probability theory geiss
PDF
Probabilistic Applications Of Tauberian Theorems A L Yakimiv
PDF
Entropy Of Compact Group Automorphisms Thomas Ward
PDF
Complex dynamics of superior phoenix set
PDF
MasterThesis(SubmittedVer)
PDF
Polya recurrence
PDF
Thesis
PDF
As pi re2015_abstracts
PDF
(Ebook) The Theory of Distributions. by El Mustapha Ait Ben Hassi. ISBN 97817...
PDF
Relative superior mandelbrot and julia sets for integer and non integer values
PDF
Relative superior mandelbrot sets and relative
PDF
Communications In Mathematical Physics Volume 297 M Aizenman Chief Editor
PDF
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
PDF
Duffing oscillator and driven damped pendulum
PDF
Behavior study of entropy in a digital image through an iterative algorithm
PDF
Mechanical estatistic1
PDF
Mechanical estatistic2
Algorithmic Thermodynamics
Thekolmogorov Legacy In Physics 1st Edition Roberto Livi Stefano Ruffo
Thermal and statistical physics h. gould, j. tobochnik-1
An introduction to probability theory geiss
Probabilistic Applications Of Tauberian Theorems A L Yakimiv
Entropy Of Compact Group Automorphisms Thomas Ward
Complex dynamics of superior phoenix set
MasterThesis(SubmittedVer)
Polya recurrence
Thesis
As pi re2015_abstracts
(Ebook) The Theory of Distributions. by El Mustapha Ait Ben Hassi. ISBN 97817...
Relative superior mandelbrot and julia sets for integer and non integer values
Relative superior mandelbrot sets and relative
Communications In Mathematical Physics Volume 297 M Aizenman Chief Editor
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
Duffing oscillator and driven damped pendulum
Behavior study of entropy in a digital image through an iterative algorithm
Mechanical estatistic1
Mechanical estatistic2

Partitions and entropies intel third draft

  • 1. Statistical Properties of the Entropy Function of a Random Partition Anna Movsheva Contents 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 General properties of θ(p, x) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 Functions µ(p) and σ(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3 The Generating Function of Momenta . . . . . . . . . . . . . . . . . . . . . . 10 1.3.4 Discussion of Conjecture 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Methods 14 3 Results 14 3.1 Computation of βr and γr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Bell Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Discussion and Conclusion 19
  • 2. Abstract It is well known that living organisms are open self-organizing thermodynamic systems with a low entropy. An estimate for the number of subsystems with low entropy would give a rough guess about the number of self-organizing subsystems that exist in a closed system S. I study the mathematical properties of a model in which a finite set X with a probability distribution l {px |x ∈ X} encodes a set of states of the system S. A partition of the set X = i=1 Yi , in this model represents a subsystem with the set of probabilities {p(Yi ) = x∈Yi px }. In this paper I study the entropy function H(p, Y ) = − i p(Yi ) ln p(Yi ) of a random partition Y . In particular I study the counting function Θ(p, x) = #{Y |H(p, Y ) ≤ x}. Using computer simulations, I give evidences that the normalized function θ(p, x) = Θ(p, x)/Θ(H(p, X)) asymptotically can be x approximated by the cumulative Gauss distribution 1/ 2πσ(p) −∞ exp(−(t − µ(p))2 /2σ(p))dt. I state my findings in a form of falsifiable conjectures some of which I partly prove. The asymptotics explain a strong correlation between µ(p), the average entropy of a random partition of X, and the entropy H(p, X). Since the quantity µ(p) is usually available in practice I can give an estimate for H(p, X) when it is not directly computable.
  • 3. Movsheva, Anna 1 Introduction 1.1 Background One of the main problems of theoretical biology and theoretical physics is to reconcile the theory of evolution with statistical mechanics and thermodynamics. It was Ilya Prigogine who was the first who made the fundamental contributions to the solution of this problem. He advocated that living organisms are open self-organizing thermodynamic systems with a low entropy. These open systems are part of a large closed system S. Since I am interested in open self-organizing thermodynamic systems it is important to know the number of subsystems within S that have low entropy. In my work I studied this question from the mathematical point of view. In my simplified approach the configuration space of S was a finite set X with a probability distribution. In my interpretation a subsystem was a partition of X. In my work I studied a function that, for a given x, counted the number of partitions of X who’s entropy did not exceed x. My approach is rather general because any configuration space can be approximated by a sufficiently large but a finite set. The controversy between classical biology and physics has a long history. It revolves around a paradox that physical processes are reversible and biological are not. Boltzmann in the process of working on this dilemma laid the foundation of statistical physics. He put forward the notion of entropy which characterizes the degree of disorder in a statistical system. The second law of thermodynamics in the formulation of Boltzmann states that the entropy of a closed system cannot decrease, which makes the time in a statistical system irreversible. The solution of the problems of irreversibility of time did not completely eliminate the contradiction. The second law of thermodynamics seems to forbid the long term existence of the organized system, such as living organisms. Schr¨dinger in his book [19] (Chapter 6) pointed out that the entropy can go down in an o open system, that is a system that can exchange mass and energy with the surroundings. Prigogine in his groundbreaking works [15, 14, 16] showed that the self-organization (decrease of entropy) can be achieved dynamically. His discovery layed down the foundation of non-equilibrium statistical mechanics. The most interesting self-organizing systems exist far away from the equilibrium and are non static by their nature. There is a vast literature on self-organization (see e.g.[16, 10, 9, 12] and the references therein). 1
  • 4. Movsheva, Anna Current research is focused on the detailed studying of individual examples of self-organization and is very successful (see e.g.[3]). In this work I changed the perspective. My motivating problem was rather general - to estimate the total number of self-organizing subsystems in a thermodynami- cally closed system. Self-organizing subsystems are the most interesting specimen of the class of subsystems with a low entropy. This motivates my interest in estimating the number of subsys- tems with a low entropy. Knowing this number the number of self-organizing subsystems can be assessed. A problem given in such generalities looks very hard so I made a series of simplifications that let me progress in this direction. Ashby in [1] argued that any system S can be thought of as a “machine”. His idea is that the configuration space of S can be approximated by a set or an alphabet X and the dynamics is given by the transition rule TX : X → X. A homomorphism between machines S = (X, TX ) and Q = (Z, TZ ) is a map ψ : X → Z such that ψTX = TZ ψ. Homomorphisms are useful in analysis of complicated systems. (See [1] for details) A submachine, according to [1], is a subset X ⊂ X that is invariant in respect to TX . I never used this definition in this paper. In my definition a submachine is a homomorphic image ψ : (X, TX ) → (Z, TZ ). For example, if a machine (X, T ) consists of N non-interactive sub machines (X1 , T1 ), . . . , (XN , TN ) then X = X1 × · · · × XN , T = T1 × · · · × TN . Projectors ψi (xi , . . . , xN ) = xi are homomorphisms of machines. This reflects the fact that the configuration space of a union of non interacting systems is a product (not a union) of the configuration spaces of the components. Definition 1.1. A collection of subsets Y = {Yz |z ∈ Z} , such that Yz ∩ Yz = ∅, z = z and z∈Z Yz = X is a partition of a finite set X, r = #X. Let ki to be the cardinality #Yz . In this are I shall use the notation X = z Yz Any homomorphism ψ : (X, TX ) → (Z, TZ ) defines a partition of X with Yz equal to {x ∈ X|ψ(x) = z}. In fact up to relabeling the elements of Z the homomorphism is the same as a partition. This also explains why I am interested in the counting of the partitions. Ashby in [1] argued that a machine (X, T ) is a limiting case of a more realistic Markov process, in which ˜ deterministic transition rules x → T (x) get replaced by random transition rules x → T (x). The dynamics of the process is completely determined by the probabilities {px ,x |x, x ∈ X} to pass from the state x to the state x and the initial probability distribution {px |x ∈ X}. Markov processes have been studies in the theory of information developed originally in [20]. 2
  • 5. Movsheva, Anna Yet there is still another way to interpret quantities that I would like to compute. A submachine can be also be interpreted as a scientific device. This can be understood in the example of a hurricane on Jupiter [2]. You can analyze the hurricane in a multitude of ways: visually through the lenses of a telescope, by recording the fluctuations of winds with a probe, by capturing the fluctuations of the magnetic field around the hurricane. Every method of analysis (device) gives a statistical data that yields in turn the respective entropy. If (X, p) is a space of states of the hurricane, then ψ : X → Z is a function, whose set of values is the set of readings of the scientific device. It automatically leads to a partition of X as it was explained above. The list of known scientific methods in planetary science is enormous [13], and any new additional method contributes something to the knowledge. Yet, the full understanding of the subject would be only possible if I used all possible methods (ψs). This, however, is not going to happen in planetary science in the near future. The reason is that the set of states X of the Jupiter atmosphere is colossal, which makes the set of all conceivable methods of its study (devices) even bigger. Still, imagine that all the mentioned troubles are nonexistent. It would be interesting to count the number of scientific devices that yield statistical data about the hurricane with entropy no greater than a given value. It would be also interesting to know their the average entropy. This is a dream. I did just that in my oversimplified model. 1.2 Research Problem In the following, the set X will be {1, . . . , r}. Let p be a probability distribution on X, that is r a collection of numbers pi ≥ 0 such that i=1 pi = 1. The array p = (p1 , . . . , pr ) is said to be a probability vector. The probability of Yi in the partition X = Yi is p(Yi ) = pj . j∈Yi l Definition 1.2. Entropy of a partition Y , H(p, Y ) is calculated by the expression − i=0 p(Yi ) ln p(Yi ). In this definition the function x ln x is extended to x = 0 by continuity 0 ln 0 = 0. r Here are some examples of entropies: H(p, Ymax ) = − i=1 pi ln pi for Ymax = {{1}, . . . , {r}}, H(p, Ymin ) = 0 for Ymin = {{1, . . . , r}}. One of the properties of the entropy function (see [6]) is 3
  • 6. Movsheva, Anna that H(p, Ymin ) ≤ H(p, Y ) ≤ H(p, Ymax ) for any Y ∈ Pr (1) It is clear from the previous discussion that Θ(p, x) = #{Y ∈ Pr |H(p, Y ) ≤ x} is identical to the function defined in the abstract. The Bell number Br ([22],[17]) is the cardinality of Pr . The value Θ(p, H(p, Ymax )) = Θ(p, H(p, id)) thanks to (1) coincides with Br . From this I conclude that #{Y ∈ Pr |H(p, Y ) ≤ x} θ(p, x) = Br is the function defined in the abstract. My main goal is to find a simple approximation to θ(p, x). 1.3 Hypothesis In this section I will formulate the conjectures that I obtained with Computing Software Mathe- matica [11]. Remark 1.3. I equipped the set Pr with the probability distribution P such that P(Y ) for Y ∈ Pr is equal to 1/Br . The value of the function θ(p, x) is the probability that a random partition Y has the entropy ≤ x. This explains the adjective “random” in the title of the paper. In order to state the main result I will need to set notation: k p[k] = (p1 , . . . , pr , 0, . . . , 0) (2) where p = (p1 , . . . , pr ) is the probability vector. From the set of momenta of the entropy of a random partition 1 E(H l (p, Y )) = H l (p, Y ) (3) Br Y ∈Pr I will use the first two to define the average µ(p) = E(H(p, Y )) and the standard deviation σ(p) = 4
  • 7. Movsheva, Anna E(H(p, Y )2 ) − E(H(p, Y ))2 . Conjecture 1.4. Let p be a probability distribution on {1, . . . , r}. Then ∞ (x−µ)2 1 lim E(H l (p[k], Y )) − √ xl e − 2σ dx = 0 k→∞ 2πσ −∞ with µ = µ(p[k]), σ = σ(p[k])and for any integer l ≥ 0. Practically this means that the cumulative normal distribution x (x−µ)2 1 Erf(x, µ, σ) = √ e− 2σ 2πσ −∞ with µ = µ(p[k]), σ = σ(p[k]) makes a good approximation to θ(p[k], x) for large k. The initial study of the function θ(p, x) has been done with the help of Mathematica. The software can effectively compute the quantities associated with set X whose cardinality does not exceed ten. 1.3.1 General properties of θ(p, x) The plots of some typical graphs are presented in Figure 1.1. These were done with a help of Mathematica. 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Figure 1.1: Graphs of θ(p, x), θ(q, x). The continuous line on the graph corresponds to θ(p, x) with p = (0.082, 0.244, 0.221, 0.093, 0.052, 0.094, 0.079, 0.130) The step function corresponds to q = ( 8 , . . . , 1 ). Large steps are common for θ(q, x) when q has 1 8 5
  • 8. Movsheva, Anna symmetries. A symmetry of q is a permutation τ of X such that qτ (x) = qx for all x ∈ X. Indeed, if I take a symmetry and act it upon a partition, I get another partition with the same entropy. This way I can produce many partitions with equal entropies. Hence, I get high steps in the graph. The effect of of the operation p → p[1] (2) on θ(p, x) is surprising. Here are the typical graphs: 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.5 1.0 1.5 2.0 2.5 Figure 1.2: Graphs of θ(p, x), θ(p[1], x), θ(p[2], x) for some randomly chosen p = (p1 , . . . , p6 ). The reader can see that the graphs have the same bending patterns. Aslo graphs lie one over the other. I wanted to put forth a conjecture that passed multiple numerical tests. Conjecture 1.5. For any p I have θ(p, x) ≥ θ(p[1], x) A procedure that plots θ(p, x) is hungry for computer memory. This is why it is worthwhile to find a function that makes a good approximation. I have already mentioned in the introduction that Erf(x, µ(p), σ(p)) approximates θ(p, x) well. For example, if p = (0.138, 0.124, 0.042, 0.106, 0.081, 0.131, 0.088, 0.138, 0.154), (4) the picture below indicates a good agreement of graphs. 6
  • 9. Movsheva, Anna 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Figure 1.3: Erf(x, µ(p), σ(p)) (red) vs θ(p, x)(blue), with p as in (4). The reader will more precise relations between Erf and θ in the following sections. 1.3.2 Functions µ(p) and σ(p) The good agreement of graphs Erf(x, µ(p), σ(p)) and θ(p, x) raises a question of a detailed analysis of the functions µ(p) and σ(p). It turns out that a more manageable quantities than µ(p) are H(p, Ymax ) β(p) = H(p, Ymax ) − µ(p), γ(p) = (5) µ(p) The unequally (1) implies that µ(p) ≤ H(p, Ymax ) and β(p) ≥ 0, γ(p) ≥ 1. Evaluation of the denominator of γ(p) with formula (3) requires intensive computing. On my slow machine I used the Monte-Carlo approximation [8] k 1 µ(p) ∼ H(p, Y i ) k i=1 where Y i are independent random partitions. Below is the graph of β(p1 , p2 , p3 ) γ(p1 , p2 , p3 ) plotted in Mathematica. The reader can distinctly see one maximum in the center corresponding to p = ( 3 , 1 , 1 ). 1 3 3 Figure 1.4: The plot of β(p1 , p2 , 1 − p1 − p2 ) Figure 1.5: The plot of γ(p1 , p2 , 1 − p1 − p2 ) A closer look at the plot shows that γ(p1 , p2 , p3 ) is not a concave function. 7
  • 10. Movsheva, Anna In the following hr stands for the probability vector ( 1 , . . . , 1 ). r r I came up with a conjecture, which has been numerically tested for r ≤ 9: Conjecture 1.6. The function γ(p1 , . . . , pr ) can be extended by continuity to all distributions p. In this bigger domain it satisfies def 1 ≤ γ(p) ≤ γ(hr ) = γr . (6) Likewise the function β satisfies def 0 ≤ β(p) ≤ β(hr ) = βr . (7) The reader should consult sections below on the alternative ways of computing of βr and γr . The following table contains an initial segment of the sequence of {γr }. Table 1: Values of γr . r 2 3 4 5 6 7 8 9 ... 100 ... 1000 γr 2 1.826 1.739 1.691 1.659 1.635 1.617 1.602 ... 1.426 ... 1.341 I see that it is a decreasing sequence. Extensive computer tests have lead me the following conjecture. Conjecture 1.7. The sequence {γr } satisfies γr ≥ γr+1 and lim γr = 1. r→∞ The limit statement is proved in Proposition 3.6. Corollary 1.8. lim γ(p[t]) = 1. t→∞ Proof. From Conjecture 1.6 I conclude that 1 ≤ γ(p[t]) ≤ γr+t . Since lim γr+t = 1 by Conjecture t→∞ 1.7, lim γ(p[t]) = 1. t→∞ Table 2: Values of βr . r 6 7 8 9 ... 100 ... βr 0.711731 0.756053 0.793492 0.825835 ... 1.3943 ... 8
  • 11. Movsheva, Anna Conjecture 1.9. The sequence {βr } satisfies βr ≤ βr+1 and lim βr = ∞. r→∞ The situation with the standard deviation σ(p) is a bit more complicated. Here is a graph of σ(p1 , p2 , p3 ). Figure 1.6: Three-dimensional view of the graph of standard deviation σ(p1 , p2 , p3 ) for θ(p, x). The reader can clearly see four local maxima. The function σ(p1 , p2 , p3 ) is symmetric. The maxima correspond to the points ( 1 , 1 , 3 ) and permutations of ( 1 , 1 , 0). This lead me to think 3 3 1 2 2 that local maxima of σ(p1 , . . . , pr ) are permutations of qk,r = hk [r − k], k ≤ r. I tabulated the values of σ(qk,r ) for small k and r in the table below. Table 3: Values of σ(qk,r ). kr 3 4 5 6 7 8 9 2 0.3396 0.3268 0.314 0.3026 0.2924 0.2832 0.275 3 0.35 0.3309 0.3173 0.3074 0.2992 0.292 0.286 4 - 0.3254 0.309 0.298 0.29 0.283 0.278 5 - - 0.302 0.289 0.28 0.273 0.267 6 - - - 0.283 0.272 0.265 0.258 7 - - - - 0.267 0.258 0.251 8 - - - - - 0.254 0.246 9 - - - - - - 0.242 The reader can see that the third row in bold has the largest values of each column. It is not hard to see analytically that qk,r is a critical point of σ. My computer experiments lead me to the following conjecture: Conjecture 1.10. The function σ(p) has a global maximum at q3,r . 9
  • 12. Movsheva, Anna 1.3.3 The Generating Function of Momenta In order to test Conjecture 1.4 I need to have an effective way of computing E(H l (p[k], Y )) for large values of k. In this section I present my computations of E(H l (p[k], Y )) for small r, which lead me to a conjectural formula for E(H l (p[k], Y )). The factorial generating function of powers of entropy can be written compactly this way: t l ∞ H(p, Y )t st ∞ − i=0 p(Yi ) ln p(Yi ) st G(p, Y, s) = = = t! t! (8) t=0 t=0 = Πl p(Yi )−p(Yi )s i=1 The function G(p, Y, s) can be extended from Pr to Pr+1 in the following way. I extend the r- dimensional probability vector p to r + 1-dimensional vector p by adding a zero coordinate. Any partition Y = {Y1 , . . . , Yl } defines a partition Y = {Y1 , . . . , Yl , {r + 1}}. Note that G(p, Y, s) = G(p , Y , s). The following generating function, after normalization, encodes all the moments of the random partition: J(p, s) = G(p, Y, s) J(p, s)/Br = E(H l (p, Y ))sl /l! (9) Y ∈Pr l≥0 I want to explore the effect of substitution p → p[k] on J(p, s). I use the following notations: At (p, s) = J(p[t], s). Here are the results of my computer experimentations. A set of two non-zero p extended by t zeros yields At (p1 , p2 , −s) = Bt+1 + (Bt+2 − Bt+1 )pp1 s pp2 s 1 2 (10) 10
  • 13. Movsheva, Anna The next is for 3 non-zero p extended by zeroes. At (p1 , p2 , p3 , −s) = Bt+1 + (Bt+2 − Bt+1 )× × (p1 + p2 )s(p1 +p2 ) pp3 s + (p1 + p3 )s(p1 +p3 ) pp2 s + (p2 + p3 )s(p2 +p3 ) pp1 s 3 2 1 (11) + (Bt+3 − 3Bt+2 + 2Bt+1 )pp1 s pp2 s pp3 s 1 2 3 I found At (p, s) for probability vector p with five or less coordinates. In order to generalize the re- sults of my computation I have to fix some notations. With the notation deg Y = deg{Y1 , . . . , Yl } = def l I set J l (p, s) = deg Y =l G(p, Y, s). The function k At (p, s) = L(l, t)J l (p, s) (12) l=1 where L(l, t) are some coefficients. For example, in the last line of formula (11) the coefficient L(3, t) is Bt+3 − 3Bt+2 + 2Bt+1 and the function J 3 (p, s) is pp1 s pp2 s pp3 s . The reader can see that 1 2 3 the coefficients of J l (p, s) in the formulae (10) and (11) coincide. The coefficients of the Bell numbers in the formulae for L(l, t): Bt+1 Bt+2 − Bt+1 Bt+3 − 3Bt+2 + 2Bt+1 Bt+4 − 6Bt+3 + 11Bt+2 − 6Bt+1 Bt+5 − 10Bt+4 + 35Bt+3 − 50Bt+2 + 24Bt+1 form a triangle. I took these constants 1, 1, −1, 1, −3, 2, 1, −6, 11, −6 and entered them into the Google search window. The result of the search lead me to the sequence A094638, Stirling numbers of the first kind, in the Online Encyclopedia of Integer Sequences (OEIS [21]). n Definition 1.11. The unsigned Stirling numbers of the first kind are denoted by k . They count the number of permutations of n elements with k disjoint cycles [22]. 11
  • 14. Movsheva, Anna Table 4: Values of the function L lt 1 2 3 4 5 ... 1 2 5 15 52 203 ... 2 3 10 37 151 674 ... 3 4 17 77 372 1915 ... 4 5 26 141 799 4736 ... 5 6 37 235 1540 10427 ... ... ... ... ... ... ... ... The rows of this table are sequences A000110, A138378, A005494, A045379. OEIS provided me with the factorial generating function for these sequences: Conjecture 1.12. l l l L(l, t) = Bt+l − Bt+l−1 + · · · + (−1)l+1 Bt+1 (13) l l−1 1 ∞ L(l, t)z t z = elz+e −1 (14) t! t=0 The identity (12) holds for all values of t. 1.3.4 Discussion of Conjecture 1.4 Formula (12) simplify computation of E(H l (p[k], Y )). Here is a sample computation of ∞ (x−µ(p[k]))2 1 − D(p, l, k) = E(H l (p[k], Y )) − xl e 2σ(p[k]) dx 2πσ(p[k]) −∞ for p = (0.4196, 0.1647, 0.4156) 12
  • 15. Movsheva, Anna Table 5: Values of the function D(l, k) lk 0 100 200 300 400 500 3 -0.0166 -0.0077 -0.0048 -0.0036 -0.0029 -0.0024 4 -0.0474 -0.0273 -0.0173 -0.0129 -0.0104 -0.0088 5 -0.0884 -0.0617 -0.0393 -0.0294 -0.0237 -0.0200 6 -0.1467 -0.1142 -0.0726 -0.0543 -0.0438 -0.0369 The reader can see that the functions k → D(p, l, k) have a minimum for some k after which they increase toward zero. 1.4 Significance There are multitudes of possible devices that can be used for study of a remote system. While some devices will convey a lot of information, some device will be inadequate. Surprisingly, the majority of the devices (see Conjectures 1.6, 1.7, and 1.9) will measure the entropy very close to the actual entropy of the system. All that is asked is that the device satisfies condition the onto map ψ : X → Z. (15) Z is the set of readings of the device. The cumulative Gauss distribution [4] makes a good approximation to θ(p, x). The only pa- rameters that have to be known are the average µ and the standard deviation σ. This give an effective way of making estimates of θ(p, x). The precise meaning of the estimates can be found in Conjecture 1.4. My work offers a theoretical advance in the study of large complex systems through entropy analysis. The potential applications will be in sciences that deal with complex systems, like econ- omy, genetics, biology, paleontology, and psychology. My theory explains some hidden relations between entropies of observed processes in a system. Also my theory can give an insight about the object of study from incomplete information. This is an important problem to solve and a valuable contribution to science according to my mentor who is an expert in this field. 13
  • 16. Movsheva, Anna 2 Methods All of the conjectures were gotten with the help of Mathematica. My main theoretical technical tool is the theory of generating functions [22]. Definition 2.1. Let ak be a sequence of numbers where k ≥ 0. The generating function correspond to ak is a formal power series k k≥0 ak t . My knowledge of Stirling numbers (see Definition 1.11) also comes from [22]. I also used Jensen Inequality (Theorem 3.4)[6]. 3 Results 3.1 Computation of βr and γr The main result of this section is the explicit formulae for βr (see formula (7)) and γr (see formula (6)): ω(r, 1) βr = rBr 1 (16) γr = ω(r,1) 1− rBr where r−1 Bi ln(r − i) ω(r, 1) = r! (17) i!(r − i − 1)! i=0 ki #Yi I set some notations. The probability of Yi is r = r and the entropy of Y is H(Y ) = l ki H(hr , Y ) = − i=1 r ln ki . After some simplifications H(Y ) becomes r 1 H(Y ) = ln r − λ(Y ) (18) r where l k k k λ(Y ) = λ(k1 . . . kl ) = ln k1 1 k2 2 . . . kl l = ki ln ki (19) i=1 14
  • 17. Movsheva, Anna The average entropy is Y ∈Pr λ(Y ) E(H(hr , Y )) = ln r − (20) rBr I am interested in calculating the sums: ω(r, q) = λ(Y )q , q ≥ 0 (21) Y ∈Pr The generating function of λ(Y )q with factorial is ∞ λ(Y )k sk Λ(Y, s) = = k1 1 s · · · kl l s k k (22) k! k=0 I will compute the generating function with factorials of the quantities Λ(r, s) = Λ(Y, s) (23) Y ∈Pr Theorem 3.1. ∞ Λ(r, s)tr = eF (s,t) (24) r! r=0 ∞ rrs tr where F (s, t) = r=1 r! . Proof. ∞ ∞ ∞ l F (s,t) F (s, t)l 1 k ks tk e = = = l! l! k! l=0 l=0 k=1 ∞ ∞ ∞ ∞ 1 k1 1 s tk1 k k2 s k2 k2 t kl l s tkl k = ··· l! k1 ! k2 ! kl ! l=0 k1 =1 k2 =1 kl =1 ∞ (25) 1 k1 1 s k2 2 s · · · kl l s tk1 +k2 +···+kl k k k = l! k1 !k2 ! · · · kl ! l=0 k1 ≥1,...,kl ≥1 ∞ 1 l! k1 1 s k2 2 s · · · kl l s tk1 +k2 +···+kl k k k = l! c1 !c2 ! . . . k1 !k2 ! · · · kl ! l=0 1≤k1 ≤k2 ≤···≤kl 15
  • 18. Movsheva, Anna Coefficient ci is the number of ks that are equal to i. After some obvious simplifications the formula above becomes: ∞ ks 1 r r! k1 1 s k2 2 s · · · kl l k k eF (s,t) = t (26) r! c1 !c2 ! . . . k1 !k2 ! · · · kl ! r=0 k1 ≤k2 ≤···≤kl ,k1 +···+kl =r Each partition Y determines a set of numbers, ki = #Yi . I will refer to k1 , . . . , kl as to a portrait of {Y1 , . . . , Yl }. Let me fixate one collection of numbers, k1 , . . . , kl . I can always assume that the sequence is not decreasing. Let me count the number of partitions with the given portrait: (k1 +k2 +···+kl )! k1 ≤ · · · ≤ kl . If the subsets were ordered the number of partitions would equal to k1 !k2 !···kl ! . In (k1 +k2 +···+kl )! my case the subsets are unordered and the number of such unordered partitions is k1 !k2 !···kl !c1 !c2 !... , where ci is the number of subsets with cardinality i. The function Λ(Y, s) depends only on the portrait of Y . From this I conclude that (k1 + k2 + · · · + kl )!k1 1 s k2 2 s . . . kl l s k k k Λ(Y, s) = (27) k1 !k2 ! · · · kl !c1 !c2 ! . . . Y ∈Pr k1 ≤k2 ≤···≤kl which yields the proof. Note that upon the substitution s = 0 the formula (24) becomes a classical generating function Bk t k t = ee −1 (28) k! k≥0 (see [22]). ∞ tr My knowledge lets me find the generating function r=0 r! ω(r, 1). Proposition 3.2. ∞ ∞ tr ω(r, 1) t tk ln k = ee −1 (29) r! (k − 1)! r=0 k=1 16
  • 19. Movsheva, Anna Proof. Using equations 22, 23, and 21 I prove that ∞ ∞ ∞ ∂ Λ(r, s)tr tr ω(r, 1)tr |s=0 = λ(Y ) = . ∂s r! r! r! r=0 r=0 Y ∈Pr r=0 ∞ ∂ Λ(r,s)tr I find alternatively the partial derivative r=0 ∂s r! |s=0 with the chain rule applied to right- ∂ F (s,t) ∂ hand-side of (24): ∂s [e ]|s=0 = eF (s,t) ∂s [F (s, t)]|s=0 . Note that F (s, t)|s=0 = et − 1 and ∂ ∞ tk k ln k ∂s [F (s, t)]|s=0 = k=1 k! . From this I infer that ∞ ∂ F (s,t) t tk k ln k [e ]|s=0 = ee −1 . ∂s k! k=1 t −1 ∞ Bn tn I want to find am explicit formula for ω(r, 1). To my advantage I know that ee = n=0 n! , where Bn is the Bell number or the number of unordered partitions that could be made my of a set of n elements [22]. To find ω(r, 1) I will expand equation (29). ∞ ∞ ∞ tr ω(r, 1) Bn t n ln ktk = r! n! (k − 1)! r=0 n=0 k=1 (30) B0 ln 2 2 B1 ln 2 B0 ln 3 3 B2 ln 2 B1 ln 3 B0 ln 4 4 = t +( + )t + ( + + )t + · · · 0! 1! 1! 1! 0! 2! 2! 1! 1! 2! 0! 3! Since equal power series have equal Taylor coefficients, I conclude that formula (29) is valid. For- mulae (16) follow (20), (5), (6), and (7). Using the first and second derivatives of equation 11 at s = 0 I find σ(q3,r ): 4Bt+1 ln 22 8Bt+1 Bt+2 ln 22 4Bt+2 ln 22 4Bt+1 ln 22 2 2 − 2 + 2 − 2 − + Bt+3 Bt+3 Bt+3 3Bt+3 σ(q3,r ) = 1 (31) 4Bt+2 ln 22 4Bt+1 ln 2 ln 3 4Bt+1 Bt+2 ln 2 ln 3 Bt+1 ln 32 Bt+1 ln 32 2 2 2 + 2 − 2 − 2 + 3Bt+3 Bt+3 Bt+3 Bt+3 Bt+3 3.2 Bell Trials I introduce a sequence of numbers (r − 1)!Bi pi = , i = 0, . . . , r − 1 (32) Br i!(r − i − 1)! 17
  • 20. Movsheva, Anna r−1 The sequence p = (p0 , . . . , pr−1 ) satisfies pi ≥ 0 and i=0 pi = 1. It follows from the recursion r−1 (r−1)!Bi formula i=0 i!(r−i−1)! = Br [22]. I refer to a random variable ξ with this probability distribution as to Bell trials. Note that the average of ln(r − ξ) is equal to ω(r, q)/rBr . r−1 (r−1)Br−1 +Br Proposition 3.3. 1. i=0 (r − i)pi = µr−1 = Br r−1 (r−2)(r−1)Br−2 +3(r−1)Br−1 +Br 2. i=0 (r − i)2 pi = Br r r!Bi xr−i+1 Proof. I will compute the generating function of Sr (x) = i=0 i!(r−i)! instead. Note that Sr (x) |x=1 = Br µr . ∞ ∞ r ∞ ∞ Sr (x)tr 1 (a + b)!Ba ta xb+1 tb Ba t a x(xt)b t t = = = ee −1 xext = xee −1+xt (33) r! r! a!b! a! b! r=0 r=0 a+b=r a=0 b=0 I factored the generating function into two series, which very conveniently simplified into expo- nential expressions. Now that I found the simplified expression for the generating function I will differentiate it ∂ t t t [xee −1+xt ]|x=1 = (xt + 1)ee −1+xt |x=1 = (t + 1)ee −1+t (34) ∂x Bk tk−1 t −1 t −1+t Note that the function k≥1 (k−1)! (compare it with formula (28)) is equal to ee = ee , which implies t −1+t t −1+t (k − 1)Bk−1 tk−1 Bk tk−1 tee + ee = + (35) (k − 1)! (k − 1)! k≥2 k≥1 and the formula for µr−1 r−1 The second moment i=0 (r − i)2 pi can be computed with the same methodic. Note that the second moment is equal to Br (x(Sr (x) )) The generating function with factorials of the second moments is ∂ t t t [x(xt + 1)ee −1+xt ]|x=1 = (t2 x2 + 3tx + 1)ee −1+xt |x=1 = (t2 + 3t + 1)ee −1+t = ∂x (k − 2)(k − 1)Bk−2 tk−1 3(k − 1)Bk−1 tk−1 Bk tk−1 (36) = + + (k − 1)! (k − 1)! (k − 1)! k≥3 k≥2 k≥1 18
  • 21. Movsheva, Anna Theorem 3.4. (Jensen’s Inequality)[6],[18] For any concave function f : R → R and a sequence qi ∈ R>0 the inequality holds r r f (i)qi ≤ f ( iqi ) i=1 i=1 I want to apply this theorem to the concave function ln x: Corollary 3.5. There is an inequality with pi as in (32): r (r − 1)Br−1 ln(r − i)pi < ln 1 + Br i=1 Proof. Follows from Jensen’s Inequality and Proposition 3.3. Proposition 3.6. lim γr = 1 r→∞ Proof. Corollary 3.5 implies 1 1 γr = ω(r,1) < “ (r−1)Br−1 ” (37) 1− rBr ln r ln 1+ B r 1− ln r It’s easy to see that Br ≥ 2Br−1 since I always have a choice of whether to add r to the same part Br−1 1 „ 1 as r − 1 or not. This implies that Br ≤ 2r−1 . From this I conclude that 1 ≤ γr < (r−1) « ln 1+ r−1 2 1− ln r and lim γr = 1. 4 Discussion and Conclusion I was not able to prove all the conjectures I made. I have made some steps (Proposition 3.6) toward the proof of Corollary 1.8 of the Conjecture 1.6, that the ratio of the maximum entropy to the average entropy is close to one. Another fact I have found was that the difference between the maximum entropy and the average entropy of partitions slowly grows as #X increases. I conjecture that the difference has the magnitude ln ln #X. Also I have computed the standard deviations, formula (31), which is conjecturally the greatest value of σ(p). 19
  • 22. Movsheva, Anna My short term goal is to prove these conjectures. The more challenging goal is to add dynamics to the system being studied and identify self-organizing subsystems among low entropy subsystems. I am grateful for the support and training of my mentor, Dr. Rostislav Matveyev, on this research. References [1] W. R. Ashby. An Introduction To Cybernetics. John Wiley and Sons Inc, 1966. [2] R. Beebe. Jupiter the Giant Planet. Smithsonian Books, Washington, 2 edition, 1997. [3] C. Bettstetter and C. Gershenson, editors. Self-Organizing Systems, volume 6557 of Lecture Notes in Computer Science. Springer, 2011. [4] W. Bryc. The Normal Distribution: Characterizations with Applications. Springer-Verlag, 1995. ¨ [5] R. Clausius. Uber die w¨rmeleitung gasf¨rmiger k¨rper. Annalen der Physik, 125:353–400, a o o 1865. [6] T.M. Cover and J.A. Thomas. Elements of information theory. Wiley, 1991. [7] S.R. de Groot and P. Mazur. Non-Equilibrium Thermodynamics. Dover, 1984. [8] Z.I. Botev D.P. Kroese, T.Taimre. Handbook of Monte Carlo Methods. John Wiley & Sons, New York, 2011. [9] H. Haken. Synergetics, Introduction and Advanced Topics,. Springer,, Berlin, 2004. [10] S.A. Kauffman. The Origins of Order. Oxford University Press, 1993. [11] Mathematica. www.wolfram.com. [12] H. Meinhardt. Models of biological pattern formation: from elementary steps to the organi- zation of embryonic axes. Curr. Top. Dev. Biol., 81:1–63, 2008. [13] D. Morrison. Exploring Planetary Worlds. W. H. Freeman, 1994. 20
  • 23. Movsheva, Anna [14] I. Prigogine. Non-Equilibrium Statistical Mechanics. Interscience Publishers, 1962. [15] I. Prigogine. Introduction to Thermodynamics of Irreversible Processes. John Wiley and Sons, 1968. [16] I. Prigogine and G. Nicolis. Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations. John Wiley and Sons, 1977. [17] G.C. Rota. The number of partitions of a set. American Mathematical Monthly, 71(5):498–504, 1964. [18] W. Rudin. Real and Complex Analysis. McGraw-Hill, 1987. [19] E. Schr¨dinger. What Is Life? Cambridge University Press, 1992. o [20] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal,, 27:379–423, 1948. [21] N. J. A. Sloane and other. The on-line encyclopedia of integer sequences oeis.org. [22] R.P. Stanley. Enumerative combinatorics, volume 1,2. CUP, 1997. 21