Jekyll2019-04-29T01:26:34-04:00https://www.adrian.idv.hk/feed.xml∫ntegrabℓε ∂ifferentiαℓsunorganised memo, notes, code, data, and writings of random topicsAdrian S. Tamrighthandabacus@users.github.comTwelvefold of combinatorics2019-04-28T23:31:58-04:002019-04-28T23:31:58-04:00https://www.adrian.idv.hk/twelvefold<p>Recently I come across the term “<a href="https://en.wikipedia.org/wiki/Twelvefold_way">Twelvefold way</a>” of combinatorics. I wish I knew the term earlier as I studied quite many of those in the graduate schools and collected “recipes” for them. I always thought of them as some cases of <a href="https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model">urn models</a> of George Pólya but they indeed have a neat classification. I try to reiterate the twelve and derive the solution below.</p> <h2 id="terms">Terms</h2> <p>Generically, the twelvefold is modeling the count of possible mappings between two finite sets, <script type="math/tex">N</script> and <script type="math/tex">X</script>, which each has cardinality of <script type="math/tex">n</script> and <script type="math/tex">x</script> respectively. The mapping is <script type="math/tex">f: N\mapsto X</script> and the property of the mapping identifies the twelve cases. First the nature of mapping:</p> <ol> <li>No condition, simply <script type="math/tex">\forall a\in N,\ \exists b\in X</script></li> <li>Injective <script type="math/tex">f</script>, so <script type="math/tex">f(a)</script> is unique: <script type="math/tex">\forall a\in N,\ \exists !b\in X</script></li> <li>Surjective <script type="math/tex">f</script>: <script type="math/tex">\forall b\in X,\ \exists a\in N, f(a)=b</script></li> </ol> <p>Then the equivalence of mappings:</p> <ol> <li><script type="math/tex">f</script>, simple</li> <li><script type="math/tex">f\circ S_n</script>, equality up to a permutation of <script type="math/tex">N</script></li> <li><script type="math/tex">S_x \circ f</script>, equality up to a permutation of <script type="math/tex">X</script></li> <li><script type="math/tex">S_x \circ f \circ S_n</script>, equality up to a permutation of <script type="math/tex">N</script> and <script type="math/tex">X</script></li> </ol> <p>If we visualize the above using a ball and urn model, we have a finite set of balls <script type="math/tex">N</script> and a finite set of urns <script type="math/tex">X</script>, and we are counting how many ways are can put the balls into the urns under different definition. The three natures of mapping means:</p> <ol> <li>No condition as long as we put all balls into urns</li> <li>Injective: All urns contain no more than one ball</li> <li>Surjective: All urns contain at least one ball, i.e., no urn can leave empty</li> </ol> <p>and the four equivalence of mapping means:</p> <ol> <li><script type="math/tex">f</script>: All balls and urns are labelled, so everything is distinguishable from each other (<script type="math/tex">f: \textrm{labelled }N\mapsto\textrm{labelled }X</script>)</li> <li><script type="math/tex">f\circ S_n</script>: All urns are labelled, but the balls are identical (<script type="math/tex">f: \textrm{unlabelled }N\mapsto\textrm{labelled }X</script>)</li> <li><script type="math/tex">S_x \circ f</script>: All balls are labelled, but urns are indistinguishable from each other (<script type="math/tex">f: \textrm{labelled }N\mapsto\textrm{unlabelled }X</script>)</li> <li><script type="math/tex">S_x \circ f \circ S_n</script>: Neither balls nor urns are labelled (<script type="math/tex">f: \textrm{unlabelled }N\mapsto\textrm{unlabelled }X</script>)</li> </ol> <p>Twelvefold way gets its name from the <script type="math/tex">3\times 4=12</script> different possible models. We can try to make up problems, as follows.</p> <h2 id="1-unrestricted-f-maps-labelled-n-to-labelled-x">(1) Unrestricted f maps labelled N to labelled X</h2> <p>Problem: How many different ways to put <script type="math/tex">n</script> distinct balls into <script type="math/tex">x</script> distinct urns?</p> <p>Solution: Each of the <script type="math/tex">n</script> balls can put into one of the <script type="math/tex">x</script> urns, so there are <script type="math/tex">x^n</script> ways</p> <h2 id="2-unrestricted-f-maps-unlabelled-n-to-labelled-x">(2) Unrestricted f maps unlabelled N to labelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> identical balls into <script type="math/tex">x</script> urns?</p> <p>Solution: We can rephrase the problem as: “If we line up <script type="math/tex">n</script> balls, how many ways can we put <script type="math/tex">x-1</script> separators between the balls”. Then, from left to right, the separators partitioned the sequence of (identical) balls into <script type="math/tex">x</script> groups. Because we do allow urns to be empty, more than one separators can be put between two balls. It is like in a sequence of <script type="math/tex">n+x-1</script> spaces, we put <script type="math/tex">n</script> balls and <script type="math/tex">x-1</script> separators. For example, the following is one result of <script type="math/tex">n=6, x=4</script>:</p> <pre><code>O O | | O | O O O </code></pre> <p>So the number of combinations is</p> <script type="math/tex; mode=display">\binom{n+x-1}{x-1} = \binom{n+x-1}{n}</script> <h2 id="3-unrestricted-f-maps-labelled-n-to-labelled-x">(3) Unrestricted f maps labelled N to labelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> distinct balls into <script type="math/tex">x</script> identical urns?</p> <p>Solution: Problem (11) is similar to this problem except with an extra condition that no urn can leave empty. The solution of problem (11) is <script type="math/tex">\genfrac{\{}{\}}{0pt}{}{n}{x}</script>. So for this problem, if we count only the non-empty urns, there can be 1, 2, up to <script type="math/tex">x</script> urns (unless <script type="math/tex">n=0</script>, in which case all urns will be empty). So the total count of possible ways is</p> <script type="math/tex; mode=display">\sum_{k=0}^x \genfrac{\{}{\}}{0pt}{}{n}{k}</script> <p>Note that the above summation should have only up to <script type="math/tex">\min(n,x)</script> terms as we can have at most <script type="math/tex">n</script> urns non-empty if <script type="math/tex">x>n</script>. But in those cases, we always have <script type="math/tex">\genfrac{\{}{\}}{0pt}{}{n}{x} = 0</script>.</p> <h2 id="4-unrestricted-f-maps-unlabelled-n-to-unlabelled-x">(4) Unrestricted f maps unlabelled N to unlabelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> identical balls into <script type="math/tex">x</script> indistinguishable urns?</p> <p>Solution: If we further requires the urns to be non-empty, this becomes problem (12) and it is called the partitioning problem, which means how many ways we can break down integer <script type="math/tex">n</script> into sum of smaller positive integers. Let us denote the count of partition problem to be <script type="math/tex">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x}</script>.</p> <p>Using similar reasoning as in problem (3), if we relaxed the condition to allow urns to be empty, unless <script type="math/tex">n=0</script>, we can have 1, 2, up to <script type="math/tex">x</script> non-empty urns. So the answer is</p> <script type="math/tex; mode=display">\sum_{k=0}^x \genfrac{\lvert}{\rvert}{0pt}{}{n}{k}</script> <p>which similarly, the above summation should have only up to <script type="math/tex">\min(n,x)</script> terms because if <script type="math/tex">x>n</script>, all <script type="math/tex">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x} = 0</script>. But indeed, using the recurrence relation explained in problem (12),</p> <script type="math/tex; mode=display">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x} = \genfrac{\lvert}{\rvert}{0pt}{}{n-1}{x-1} + \genfrac{\lvert}{\rvert}{0pt}{}{n-x}{x}</script> <p>we can derive that:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} \genfrac{\lvert}{\rvert}{0pt}{}{n+x}{x} &= \genfrac{\lvert}{\rvert}{0pt}{}{n+x-1}{x-1} + \genfrac{\lvert}{\rvert}{0pt}{}{n}{x} \\ &= \genfrac{\lvert}{\rvert}{0pt}{}{n+x-2}{x-2} + \genfrac{\lvert}{\rvert}{0pt}{}{n}{x-1} + \genfrac{\lvert}{\rvert}{0pt}{}{n}{x} \\ &= \genfrac{\lvert}{\rvert}{0pt}{}{n+x-3}{x-3} + \genfrac{\lvert}{\rvert}{0pt}{}{n}{x-2} + \genfrac{\lvert}{\rvert}{0pt}{}{n}{x-1} + \genfrac{\lvert}{\rvert}{0pt}{}{n}{x} \\ &= \cdots \\ &= \genfrac{\lvert}{\rvert}{0pt}{}{n+x-x}{x-x} + \sum_{k=1}^x \genfrac{\lvert}{\rvert}{0pt}{}{n}{k} \\ &= \sum_{k=0}^x \genfrac{\lvert}{\rvert}{0pt}{}{n}{k} \end{align} %]]></script> <p>So we can avoid the summation and find the answer to this problem to be</p> <script type="math/tex; mode=display">\sum_{k=0}^x \genfrac{\lvert}{\rvert}{0pt}{}{n}{k} = \genfrac{\lvert}{\rvert}{0pt}{}{n+x}{x}</script> <h2 id="5-injective-f-maps-labelled-n-to-labelled-x">(5) Injective f maps labelled N to labelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> balls into <script type="math/tex">x</script> urns such that no urn can hold more than one ball?</p> <p>Solution: Because of the restriction that no urn can have more than one ball, it makes sense only for <script type="math/tex">x\ge n</script>. When we place the first ball, we pick one of the <script type="math/tex">x</script> urns. Then the second ball, we pick one of the <script type="math/tex">x-1</script> remaining urns, and so on. So the number of ways is:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} & x(x-1)(x-2)\cdots(x-n+1) \\ =& x^{\underline{n}} \\ =& \frac{x!}{(x-n)!} \end{align} %]]></script> <h2 id="6-injective-f-maps-unlabelled-n-to-labelled-x">(6) Injective f maps unlabelled N to labelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> identical balls into <script type="math/tex">x</script> urns such that no urn can hold more than one ball?</p> <p>Solution: Again, this makes sense only for <script type="math/tex">x\ge n</script>. As the result is only <script type="math/tex">n</script> out of <script type="math/tex">x</script> urns can have a ball, so it is a combination problem with solution</p> <script type="math/tex; mode=display">\binom{x}{n}</script> <h2 id="7-injective-f-maps-labelled-n-to-unlabelled-x">(7) Injective f maps labelled N to unlabelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> distinct balls into <script type="math/tex">x</script> indistinguishable urns such that no urn can hold more than one ball?</p> <p>Solution: Assume <script type="math/tex">x\ge n</script>. As the urns are indistinguishable, we will find <script type="math/tex">n</script> urns with balls labelled as 1, 2, up to <script type="math/tex">n</script> and the remaining urns (if any) are empty. So there will only be one way. The general answer is either 1 or 0, depends on whether <script type="math/tex">x\ge n</script>. Using the Iverson bracket notation:</p> <script type="math/tex; mode=display">% <![CDATA[ [x\ge n] := \begin{cases} 1 & \textrm{if }x\ge n \\ 0 & \textrm{otherwise} \end{cases} %]]></script> <h2 id="8-injective-f-maps-unlabelled-n-to-unlabelled-x">(8) Injective f maps unlabelled N to unlabelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> identical balls into <script type="math/tex">x</script> indistinguishable urns such that no urn can hold more than one ball?</p> <p>Solution: Indeed this is identical to problem (7) as the balls are labelled or not does not matter in the reasoning of the solution in that problem. So the count of ways is <script type="math/tex">[x\ge n]</script>.</p> <h2 id="9-surjective-f-maps-labelled-n-to-labelled-x">(9) Surjective f maps labelled N to labelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> balls into <script type="math/tex">x</script> urns such that no urn is left empty?</p> <p>Solution: If the urns are indistinguishable, it becomes problem (11), which has solution <script type="math/tex">\genfrac{\{}{\}}{0pt}{}{n}{x}</script>. But if urns are labeled, each solution in problem (11) correspond to a permutation of urns in this problem, which <script type="math/tex">x</script> urns has <script type="math/tex">x!</script> such rearrangement possible. So the solution is</p> <script type="math/tex; mode=display">x! \genfrac{\{}{\}}{0pt}{}{n}{x}</script> <h2 id="10-surjective-f-maps-unlabelled-n-to-labelled-x">(10) Surjective f maps unlabelled N to labelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> identical balls into <script type="math/tex">x</script> urns such that no urn is left empty?</p> <p>Solution: If we line up <script type="math/tex">n</script> balls, there are <script type="math/tex">n-1</script> spaces between two balls. We pick <script type="math/tex">x-1</script> of them to place a separator, then we resulted in a partition of the <script type="math/tex">n</script> balls into <script type="math/tex">x</script> groups, each is non-empty. So the number of ways is</p> <script type="math/tex; mode=display">\binom{n-1}{x-1}</script> <h2 id="11-surjective-f-maps-labelled-n-to-unlabelled-x">(11) Surjective f maps labelled N to unlabelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> balls into <script type="math/tex">x</script> indistinguishable urns such that no urn is empty?</p> <p>Solution: This is the simplified problem for problems (3) and (9). We will denote the answer as</p> <script type="math/tex; mode=display">\genfrac{\{}{\}}{0pt}{}{n}{x}</script> <p>Which it has the following properties. For the trivial case of <script type="math/tex">n=x</script> or <script type="math/tex">x=1</script>, there can only be one way so we denote</p> <script type="math/tex; mode=display">\genfrac{\{}{\}}{0pt}{}{n}{n} = \genfrac{\{}{\}}{0pt}{}{n}{1} = 1</script> <p>also, if <script type="math/tex">x > n</script>, we can have no solution, thus</p> <script type="math/tex; mode=display">\genfrac{\{}{\}}{0pt}{}{n}{x} = 0 \quad \textrm{if }x>n>0</script> <p>and for the degenerated cases:</p> <script type="math/tex; mode=display">\begin{align} \genfrac{\{}{\}}{0pt}{}{0}{0} = 1 \\ \genfrac{\{}{\}}{0pt}{}{n}{0} = 0 \quad \textrm{if }n\ge 1 \end{align}</script> <p>Then we consider for other cases of <script type="math/tex">\genfrac{\{}{\}}{0pt}{}{n}{x}</script>: Assume we already have an answer for the case of <script type="math/tex">n-1</script> balls and we are adding the <script type="math/tex">n</script>-th ball into the problem. We can have two cases:</p> <ol> <li>The <script type="math/tex">n</script>-th ball is in its own urn and the other <script type="math/tex">n-1</script> balls are in <script type="math/tex">x-1</script> urns</li> <li>The <script type="math/tex">n-1</script> balls already scattered across <script type="math/tex">x</script> urns and the <script type="math/tex">n</script>-th ball is put into one of the <script type="math/tex">x</script> urns</li> </ol> <p>For the latter case, there is <script type="math/tex">x</script> possibilities of the <script type="math/tex">n</script>-th ball. So these two cases forms the recurrence relation, when <script type="math/tex">n>x>1</script>:</p> <script type="math/tex; mode=display">\genfrac{\{}{\}}{0pt}{}{n}{x} = \genfrac{\{}{\}}{0pt}{}{n-1}{x-1} + x \genfrac{\{}{\}}{0pt}{}{n-1}{x}</script> <p>Indeed, this is called the Stirling number of the second kind, which is the number of ways to partition <script type="math/tex">n</script> elements into <script type="math/tex">x</script> subsets:</p> <script type="math/tex; mode=display">\genfrac{\{}{\}}{0pt}{}{n}{x} = \frac{1}{x!}\sum_{k=0}^x (-1)^k\binom{x}{k}(x-k)^n</script> <h2 id="12-surjective-f-maps-unlabelled-n-to-unlabelled-x">(12) Surjective f maps unlabelled N to unlabelled X</h2> <p>Problem: How many ways to put <script type="math/tex">n</script> identical balls into <script type="math/tex">x</script> indistinguishable urns such that no urn is left empty?</p> <p>Solution: This is the simplified problem of (4) and the solution is the partition number, which is the number of ways to write integer <script type="math/tex">n</script> as sum of positive integers. We denote such number as</p> <script type="math/tex; mode=display">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x}</script> <p>and it has the following properties. The trivial case that <script type="math/tex">x=n</script> and <script type="math/tex">x=1</script> has only one way, so</p> <script type="math/tex; mode=display">\genfrac{\lvert}{\rvert}{0pt}{}{n}{n} = \genfrac{\lvert}{\rvert}{0pt}{}{n}{1} = 1</script> <p>For <script type="math/tex">x>n</script>, we have no solution, so</p> <script type="math/tex; mode=display">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x} = 0 \quad \textrm{if }x>n>0</script> <p>and for the degenerated cases:</p> <script type="math/tex; mode=display">\begin{align} \genfrac{\lvert}{\rvert}{0pt}{}{0}{0} = 1 \\ \genfrac{\lvert}{\rvert}{0pt}{}{n}{0} = 0 \quad \textrm{if }n\ge 1 \end{align}</script> <p>Then we consider for other cases of <script type="math/tex">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x}</script>:</p> <ol> <li>If one urn has one ball, take away that urn is a solution for the case of <script type="math/tex">n-1</script> balls and <script type="math/tex">x-1</script> urns</li> <li>If all urns has at least two balls, take away one ball form each urn is a solution for <script type="math/tex">n-x</script> balls and <script type="math/tex">x</script> urns</li> </ol> <p>So combining the two cases above we have the following recurrence relation, when <script type="math/tex">n>x>1</script>:</p> <script type="math/tex; mode=display">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x} = \genfrac{\lvert}{\rvert}{0pt}{}{n-1}{x-1} + x \genfrac{\lvert}{\rvert}{0pt}{}{n-x}{x}</script> <h2 id="summary">Summary</h2> <table> <thead> <tr> <th> </th> <th>f</th> <th>injective f</th> <th>surjective f</th> </tr> </thead> <tbody> <tr> <td><script type="math/tex">f</script> (LN → LX)</td> <td>sequence<br /><script type="math/tex">x^n</script></td> <td>permutation<br /><script type="math/tex">x^{\underline{n}}</script></td> <td>set partition<br /><script type="math/tex">x!\genfrac{\{}{\}}{0pt}{}{n}{x}</script></td> </tr> <tr> <td><script type="math/tex">f\circ S_n</script> (UN → LX)</td> <td>multicombination<br /><script type="math/tex">\binom{n+x-1}{n}</script></td> <td>combination<br /><script type="math/tex">\binom{x}{n}</script></td> <td>composition<br /><script type="math/tex">\binom{n-1}{n-x}</script></td> </tr> <tr> <td><script type="math/tex">S_x \circ f</script> (LN → UX)</td> <td>set partition<br /><script type="math/tex">\sum_{k=0}^x\genfrac{\{}{\}}{0pt}{}{n}{k}</script></td> <td>pigeon hole<br /><script type="math/tex">[x\ge n]</script></td> <td>set partition<br /><script type="math/tex">\genfrac{\{}{\}}{0pt}{}{n}{x}</script></td> </tr> <tr> <td><script type="math/tex">S_x \circ f\circ S_n</script> (UN → UX)</td> <td>partitioning<br /><script type="math/tex">\genfrac{\lvert}{\rvert}{0pt}{}{n+x}{x}</script></td> <td>pigeon hole<br /><script type="math/tex">[x\ge n]</script></td> <td>partitioning<br /><script type="math/tex">\genfrac{\lvert}{\rvert}{0pt}{}{n}{x}</script></td> </tr> </tbody> </table> <h2 id="extension">Extension</h2> <p>The twelvefold way is attributed to Gian-Carlo Rota (according to Wikipedia) and the Wikipedia page also mentioned an extension to twentyfold way, which include the cases that all urn must have exactly one ball (i.e., the mapping <script type="math/tex">f</script> is bijective).</p> <p>A paper <a href="https://arxiv.org/pdf/math/0606404.pdf">Let’s Expand Rota’s Twelvefold Way For Counting Partitions!</a> also extends the twelvefold way into thirtyfold.</p>Adrian S. Tamrighthandabacus@users.github.comRecently I come across the term “Twelvefold way” of combinatorics. I wish I knew the term earlier as I studied quite many of those in the graduate schools and collected “recipes” for them. I always thought of them as some cases of urn models of George Pólya but they indeed have a neat classification. I try to reiterate the twelve and derive the solution below.The Naked Presenter2019-04-23T19:37:41-04:002019-04-23T19:37:41-04:00https://www.adrian.idv.hk/nakedpresenter<p>TLDR: A book on presentation. Not on slides, but on how to give a presentation in person, and how to engage the audience to the topic</p> <h2 id="chapter-2-preparation">Chapter 2: Preparation</h2> <ul> <li>Identify your audience before your talk</li> <li>The why: What’s my point and why does it matter <ul> <li>many presenter focus on what, some focus on how, but seldom on why</li> <li>why = big picture</li> </ul> </li> <li>rehearse, tell a story <ul> <li>story = identify a problem, identify the causes, how and why you solved the problem</li> </ul> </li> <li>Aristotle on good public speaking <ol> <li>Appeals to reason</li> <li>Appeals to emotions</li> <li>Appeals based on the character and personality of the speaker</li> </ol> </li> <li>Deep or wide? One-hour lecture is too short to go both direction</li> <li>Preparation process <ol> <li>Quiet time and place</li> <li>Remove distractions</li> <li>Go analog: Grab a book, sticky notes, index cards</li> <li>Identify core point <ul> <li>topic is a noun phase, core point is a full sentence to take home</li> </ul> </li> <li>Brainstorm</li> <li>Consolidate, edit, and group your ideas into three sections</li> <li>Sketch your visuals</li> <li>Build visuals on software</li> </ol> </li> </ul> <h2 id="chapter-3-connect-with-punch-presence-and-projection">Chapter 3: Connect with Punch, Presence, and Projection</h2> <ul> <li>Opening punch to hook the audiences</li> <li>Personal, Unexpected, Novel, Challenging, Humorous</li> <li>Never start with an apology</li> <li>Do not show the structure, but hint or articulate</li> <li>Presence = Focus on here and now <ul> <li>do not concern for failure or success</li> <li>honest conversation creates stronger connections with the audience <ul> <li>remove worries about the outcomes, you are able to be your natural self and audiences will know the difference</li> <li>you are not reading a speech</li> </ul> </li> </ul> </li> <li>Projection = Lasting connection (impression about you) <ul> <li>the way you look, move, and sound</li> <li>eye contact</li> </ul> </li> </ul> <h2 id="chapter-4-engage-with-passion-proximity-and-play">Chapter 4: Engage with Passion, Proximity, and Play</h2> <ul> <li>Show passion: put your heart and soul into it <ul> <li>show you are interested rather than to make it interesting</li> <li>our brains are activated by the movements and feelings of others</li> </ul> </li> <li>Presenter is an artist on stage, performance with great content, powerful visuals, and emotional touch to make a lasting connection with the audience</li> <li>Story of the 1984 commercial of Apple <ul> <li>exhibits solid conflicts and contrasts</li> <li>emotions: use stories and examples</li> <li>sell experience, not the features of the thing itself</li> </ul> </li> <li>Interact using proximity <ul> <li>stay closer to the audiences</li> <li>come out from the barriers, use a remote</li> </ul> </li> <li>Sir Ken Robinson on public speaking <ul> <li>speaking to individuals not an abstract group</li> <li>relax</li> <li>be conversational and make connection</li> <li>know your material</li> <li>prepare but not rehearse (so don’t lose the natural connection with the audiences)</li> <li>leave room for improvisation</li> <li>humor is for engagement: if they are laughing then they are listening</li> </ul> </li> <li>Spirit of play improves learning and stimulates creative thinking <ul> <li>humor leans to joyfulness, leads to productivity</li> <li>play: exploration and discovery</li> </ul> </li> </ul> <h2 id="chapter-5-sustain-with-pace-and-participation">Chapter 5: Sustain with Pace and Participation</h2> <ul> <li>Attention span: 18 minutes, or shorter <ul> <li>the 10-minute rule, then do something to change things up to keep people engaged</li> <li>give the brain a break every 10 minutes</li> </ul> </li> <li>Slow down the rate of speaking <ul> <li>natural to speak too fast, as a result of increased adrenaline</li> </ul> </li> <li>Make use of variation in rate, volume, pitch <ul> <li>read the situation and make adjustments on the spot</li> </ul> </li> <li>Never go overtime (80% full)</li> <li>Participation <ul> <li>“Tell me and I forget. Teach me and I remember. Involve me and I learn” (Benjamin Franklin)</li> <li>Have your audience do something, give them an experience</li> <li>e.g., ask questions</li> </ul> </li> <li>Steve Jobs presentations <ul> <li>Walks on stage, confident but humble: Establish connection with audiences</li> <li>No agenda slide, but give people an idea where you’re going (“I’ve got four things I’d like to talk about”) <ul> <li>SJ often structures his talks around 3-4 parts with one theme</li> </ul> </li> <li>Show your enthusiasm: Believe what he says, sincere, authentic</li> <li>Be positive, upbeat, humorous</li> <li>Not about numbers, but what the numbers mean <ul> <li>break down numbers, compares it</li> </ul> </li> <li>Make it visual</li> <li>Introduce something unexpected</li> <li>Include only what is necessary</li> <li>Vary the pace and change techniques: Mixes in video clips, images, stories, data, etc.</li> <li>Save the best for last</li> <li>Go the appropriate length</li> </ul> </li> </ul> <h2 id="chapter-6-end-with-a-powerful-finish">Chapter 6: End with a Powerful Finish</h2> <ul> <li>Sticky ending: Simple, concrete, unexpected, credible</li> <li>Take it back to the beginning</li> <li>Summarize the main points</li> <li>Tell a story</li> <li>Make them laugh</li> <li>End on a positive note that give people hope and encouragement to keep learning and investigating on their own</li> </ul> <h2 id="chapter-7-continuous-improvement-through-persistence">Chapter 7: Continuous Improvement Through Persistence</h2> <ul> <li>Three C’s of presenting with impact <ul> <li>Contribution</li> <li>Connection</li> <li>Change</li> </ul> </li> </ul>Adrian S. Tamrighthandabacus@users.github.comTLDR: A book on presentation. Not on slides, but on how to give a presentation in person, and how to engage the audience to the topicKaratsuba method of multiplication2019-04-18T02:56:00-04:002019-04-18T02:56:00-04:00https://www.adrian.idv.hk/karatsuba<p>This is a <a href="https://www.wired.com/story/mathematicians-discover-the-perfect-way-to-multiply/">way of doing multiplication</a> of large numbers that I learned today. It allows the multiplication below <script type="math/tex">O(n^2)</script> complexity.</p> <p>Consider the simple case of two multiple digit decimal numbers, written in the form of <script type="math/tex">a\times 10^k+b</script> and <script type="math/tex">c\times 10^k + d</script> for some positive integer <script type="math/tex">k</script>. Then their product is</p> <script type="math/tex; mode=display">ac\times 10^{2k} + (ad+bc)\times 10^k + bd</script> <p>But we can find that <script type="math/tex">(a+b)\times (c+d) = ac+ad+bc+bd</script> so we can simply calculate the product by the algorithm below:</p> <ol> <li>Multiplication: <script type="math/tex">ac</script> and <script type="math/tex">bd</script></li> <li>Addition: <script type="math/tex">a+b</script> and <script type="math/tex">c+d</script></li> <li>Multiplication: <script type="math/tex">(a+b)\times(c+d)</script></li> <li>Subtraction: <script type="math/tex">ad+bc = (a+b)(c+d) - ac - bd</script></li> <li>Addition, with digit shifting: <script type="math/tex">ac\times 10^{2k} + (ad+bc)\times 10^k + bd</script></li> </ol> <p>So if each of <script type="math/tex">a,b,c,d</script> are of <script type="math/tex">k</script> digits, the naive multiplication will do <script type="math/tex">(2k)^2</script> single-digit multiplications. The Karatsuba method will do:</p> <ol> <li><script type="math/tex">2k^2</script> single-digit multiplication</li> <li><script type="math/tex">2k</script> single-digit addition</li> <li><script type="math/tex">k^2</script> single-digit multiplication</li> <li><script type="math/tex">2k</script> single-digit subtraction, twice</li> <li><script type="math/tex">2k</script> single-digit addition, twice</li> </ol> <p>So a total of <script type="math/tex">3k^2</script> multiplications and <script type="math/tex">10k</script> addition/subtractions, which is an obvious reduction from <script type="math/tex">4k^2</script> multiplications. Recursive application gives a complexity of <script type="math/tex">\Theta(n^{\log_2 3})</script> for multiplying two <script type="math/tex">n</script>-digit numbers.</p> <p>Generalizing the Karatsuba method, we can assume the two numbers are of the form <script type="math/tex">a\times p^k+b</script> and <script type="math/tex">c\times p^k+d</script> for a <script type="math/tex">p</script>-ary number. But we need both <script type="math/tex">b</script> and <script type="math/tex">d</script> to be <script type="math/tex">k</script> digits to apply this method.</p> <p>If we do not only split a <script type="math/tex">n</script>-digit number into two, like above, but <script type="math/tex">m\ge 2</script> numbers, then we have the <a href="https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication">Toom-Cook multiplication</a>, of complexity <script type="math/tex">\Theta(c(m)n^\epsilon)</script> where <script type="math/tex">c</script> the complexity of addition of small constants, <script type="math/tex">n^\epsilon</script> the time for sub-multiplication; <script type="math/tex">\epsilon = \log(2m − 1) / \log(m)</script></p> <h2 id="reference">Reference</h2> <ul> <li><a href="https://brilliant.org/wiki/karatsuba-algorithm/">https://brilliant.org/wiki/karatsuba-algorithm/</a></li> <li>The picture below from <a href="https://www.wired.com/story/mathematicians-discover-the-perfect-way-to-multiply/">https://www.wired.com/story/mathematicians-discover-the-perfect-way-to-multiply/</a> makes the method easy to understand</li> </ul> <p><img src="https://media.wired.com/photos/5cb10d6b439ed155d5df38cc/master/w_700,c_limit/KaratsubaMethod_560-1065x1720.jpg" alt="" /></p>Adrian S. Tamrighthandabacus@users.github.comThis is a way of doing multiplication of large numbers that I learned today. It allows the multiplication below complexity.Team Geek2019-04-17T19:06:27-04:002019-04-17T19:06:27-04:00https://www.adrian.idv.hk/teamgeek<p>TLDR: A book summarizing a lot of common sense on working in an engineering organization. It proposed the concept “HRT” – humble, respect, trust – as the general rule for dealing with people, whether they are coworkers, managers, customers.</p> <h2 id="chapter-1-the-myth-of-the-genius-programmer">Chapter 1: The Myth of the Genius Programmer</h2> <p>Linus’ real achievement: lead people and coordinate their work</p> <ul> <li>Stallman did not write all GNU</li> <li>Ken Thompson and Dennis Ritchie did not write the whole UNIX</li> </ul> <p>Insecurity</p> <ul> <li>people are afraid of others seeing and judging their work-in-progress</li> <li>Hiding is considered harmful <ul> <li>collaboration moves faster</li> <li>“fail early, fail fast, fail often”: <strong>tight feedback loops</strong></li> <li>truck factor of project</li> <li>high bandwidth low friction team connection</li> <li>working alone is riskier then working with others</li> </ul> </li> </ul> <p>Three pillars of collaboration</p> <ul> <li><strong>humility</strong>: open to self improvement <ul> <li>Lose the ego: self confidence is good, but don’t become like a know it all</li> <li>build a group pride, the collective ego</li> </ul> </li> <li><strong>respect</strong>: care about others, appreciate their abilities and achievements <ul> <li>constructive criticism = respect, care about improvement</li> </ul> </li> <li><strong>trust</strong>: believe others are competent and will do the right thing <ul> <li>Failure is an option: key to learning is to document the failures (postmortems) <ul> <li>what was learned? What is going to change?</li> <li>summary, timeline of events, primary cause, impact and damages, fixes, preventions implemented, lessons learned</li> </ul> </li> </ul> </li> </ul> <p>Software development is team sport</p> <ul> <li>Almost all social conflict is due to lack of humility, respect, trust</li> </ul> <p>Adapt the system to your desires, get the system to do your work</p> <p>Once you reach a local maximum on your team, you stopped learning</p> <ul> <li>then get bored</li> <li>then became obsolete</li> <li>humility = willing to learn</li> </ul> <p>Top down engineer = get full picture of every call before proceeding to tackle the bug</p> <p>Bottom up engineer = dive then dig the way out</p> <p>Vulnerable to influence</p> <ul> <li>in order to be heard properly, you first need to listen to others</li> <li>do not live in defence. Team mates are collaborators, not competitors</li> </ul> <h2 id="chapter-2-building-an-awesome-team-culture">Chapter 2: Building an Awesome Team Culture</h2> <p>Story of a sourdough: starter yeast, strong enough to overtake any other wild strains of yeast or bacteria</p> <p>Team culture: set of shard experiences, values, goals</p> <ul> <li>continue to change and develop</li> <li>elements of a culture: <strong>design doc, TDD, code review</strong></li> <li>root culture by founders and earliest employees <ul> <li>team leader does not curate the culture of the team</li> <li>when someone joins the team, she pick up culture from everyone she works with</li> </ul> </li> </ul> <p>Good company: allow engineers safely share ideas and have a voice in the decision making process</p> <ul> <li>top down management: alpha engineer is the team lead and lesser engineers are hired as team members <ul> <li>subservient team members are cheaper and easier to push around</li> <li>but will be hard to hire great engineers</li> <li>“if she can drive the bus, she will not want to ride the bus”</li> </ul> </li> <li>consensus driven management, entire team participate in the decision making process</li> <li>crappy leaders also too insecure to deal with great engineers, and also tend to boss people around</li> <li>introvert people rarely excel in an aggressive environment <ul> <li>discourage them from being active participants</li> </ul> </li> </ul> <p>Communication</p> <ul> <li>as few people as necessary in a synchronous communication</li> <li>boarder audience for asynchronous communication</li> <li>if you don’t expend any effort on good communication, you’ll waste considerable effort doing work that’s either unnecessary or already being done by other members of your team</li> </ul> <p>Mission statement in engineering team</p> <ul> <li>clarifying what the team should and shouldn’t be working on</li> <li>not: marketing-speak</li> <li>should include: direction and scope limiter <ul> <li>discussion to come to an agreement on product direction</li> </ul> </li> <li>if radical changes happen, team members need to be honest and reevaluate whether the mission still make sense</li> </ul> <p>Efficient meetings: like sewage treatment plants, few, far between, and downwind</p> <ul> <li>standing meeting: every week, absolutely basic announcements <ul> <li>anything worth deeper discussion should take place after the meeting</li> <li>people should be happy to leave the meeting once the main part of it is done</li> </ul> </li> <li>meeting with 5+ people, one arbiter to make decisions</li> <li>meetings = interruption to “make time” <ul> <li>“maker’s schedule” vs “manager’s schedule” (Paul Graham)</li> <li>makers need 20-30 hours of “make time” set aside in larger blocks</li> </ul> </li> </ul> <p>Five rule for meeting</p> <ol> <li>only invite people who absolutely need to be there</li> <li>have an agenda and distribute it well before the meeting <ul> <li>people start reading email in a meeting = people should not be in the meeting</li> </ul> </li> <li>end the meeting early if goals accomplished</li> <li>keep meeting on track</li> <li>schedule the meeting near other interrupt points in the day (e.g. lunch, end of day)</li> </ol> <p>Design docs</p> <ul> <li>owned by one, authored by 2-3, reviewed by a larger set</li> <li>high level blueprint of project</li> <li>low cost way to communicate on what you want to do and how you intend to do it</li> <li>at design doc, easier to accept criticism <ul> <li>because not yet invested weeks writing code</li> <li>update doc when coding started, as project grows and changes</li> </ul> </li> <li>control time: doc should not dominate project time</li> </ul> <p>Communication tools</p> <ul> <li>mailing list: history of your project, easy to refer for newcomers</li> <li>online chat: quick request to a teammate without interrupting her work <ul> <li>old time: IRC channel group chat</li> </ul> </li> <li>issue tracker: processing and triaging bugs <ul> <li>just a specialized bulletin board</li> </ul> </li> <li>code comments: most useful at function or method level</li> </ul> <h2 id="chapter-3-every-boat-needs-a-captain">Chapter 3: Every Boat Needs a Captain</h2> <p>Someone needs to get into the driver’s seat</p> <ul> <li>driver = motivated, impatient</li> <li>help team resolve conflicts, make decisions, coordinate people</li> </ul> <p>Why management</p> <ul> <li>Peter principle (so don’t force people into management)</li> <li>to scale yourself</li> </ul> <p>Management</p> <ul> <li>“carrot and stick” method of management <ul> <li>ineffective and harmful to engineers’ productivity</li> <li>only for assembly line worker of years past, because those workers could be trained in days</li> </ul> </li> <li>engineer needs nurturing, time, and space to think and create</li> <li>traditional managers worry about how to get things done, leaders worry about what things get done</li> <li>Google tech lead manager: responsible for tech direction for all of a product in addition to the careers and happiness of the engineers on the team</li> </ul> <p>Quantifying management work is difficult. Making the team happy and productive is a measure.</p> <p>Resist the urge to manage</p> <ul> <li>otherwise: micromanagement, ignoring low performers, hiring pushovers</li> <li><strong>servant leadership</strong>: smooth the way, advise when necessary <ul> <li>manage both the technical and social health of the team</li> </ul> </li> </ul> <p>Antipatterns</p> <ul> <li>hiring pushovers: ppl not as smart or ambitious as you <ul> <li>they feel more insecure than you</li> <li>without them, you have opportunities to investigate new possibilities</li> </ul> </li> <li>ignore low performers: hardest part of dealing with humans is handling someone who isn’t meeting expectations <ul> <li>hope is not a strategy</li> <li>hurts team morale, and did not help the low performers grow</li> <li>sometimes they merely need some encouragement or direction <ul> <li>temporary micromanagement can help</li> <li>a lot of HRT, specific goal, small and incremental goals</li> </ul> </li> </ul> </li> <li>ignore human issues: manager need to balance focus on both technical and human side <ul> <li>without empathy to ppl, manager lost respect from ppl</li> </ul> </li> <li>be everyone’s friend: Be a tough leader without tossing existing friendship <ul> <li>socially connected with lunch</li> <li>informal conversations</li> </ul> </li> <li>compromise hiring bar: cost in hiring = save in team productivity <ul> <li>Steve Jobs “A people hire other A people. B people hire C people”</li> <li>without raw material for a great team, you are doomed</li> </ul> </li> <li>treat them like your children: give them opportunity to be responsible for their job <ul> <li>do not micromanage or disrespectful of their abilities</li> <li>trust people</li> <li>Google tech shop: stealing is possible, but the cost of a workforce like children is more than a few pens or USB drives</li> </ul> </li> </ul> <p>Leadership patterns</p> <ul> <li>lose the ego <ul> <li>humility is not lacking confidence</li> <li>trust your team, respect their abilities, even if they’re new to the team</li> <li>leaders drive, people decide the nuts and blots</li> <li>promote greater sense of ownership</li> <li>encourage inquiry, focus on the big picture you want to accomplish</li> </ul> </li> <li>be a Zen master <ul> <li>Less vocally skeptical while still letting your team know you’re aware of the intricacies and obstacles involved in your work</li> <li>maintain calm, no matter what blew up, what crazy things happened</li> <li>ask questions</li> </ul> </li> <li>be a catalyst <ul> <li>lead without official authority: working to build team consensus</li> <li>jump in to clear roadblocks that they cannot get past but easy for you to handle <ul> <li>e.g., because you know the right person to contact</li> </ul> </li> <li>let your team know it is ok to fail</li> <li>praise an individual in front of the team, constructive criticism in private</li> </ul> </li> <li>be a teacher and a mentor <ul> <li>give team member a chance to learn</li> <li>three things needed: <ul> <li>experience the process and systems;</li> <li>explain to the team;</li> <li>ability to gauge how much help your mentee needs</li> </ul> </li> </ul> </li> <li>set clear goals <ul> <li>create a concise mission statement</li> <li>then set back and let the team work autonomously</li> <li>goal helps efficiency by not wasting time</li> </ul> </li> <li>be honest <ul> <li>“I will tell you when I can’t tell you something or if I just don’t know”</li> <li>do not use compliment sandwich, straight to the point with respect <ul> <li>clear feedback without candy coat</li> </ul> </li> </ul> </li> <li>track happiness <ul> <li>ask “what do you need?”</li> <li>evenly distribute thankless tasks</li> <li>remember people have life outside work <ul> <li>unrealistic expectation about the amount of time they can work = burnt out</li> <li>be sensitive to situations change outside work</li> </ul> </li> </ul> </li> <li>delegate but keep your hands dirty <ul> <li>dirty hand earns respect</li> </ul> </li> <li>seek to replace yourself <ul> <li>give team the opportunity to take on more responsibility</li> <li>identify who can lead and who willing to lead</li> </ul> </li> <li>be bold to make waves <ul> <li>problems will not work themselves out, don’t wait</li> <li>delay the inevitable cause untold damages</li> </ul> </li> <li>shield your team from chaos</li> <li>share information with the team</li> <li>let the team know when they are doing well</li> <li>always say yes if something is easy to undo <ul> <li>let people try and learn</li> </ul> </li> </ul> <p><strong>People like plants</strong>, some need more light and some need more water. It is the manager to figure out which needs what and give it to them</p> <p>The bigger their stake is in the success of the product, the greater their interest is in seeing it succeed</p> <p>An engineer’s skills are like the blade of a knife: you may spend tens of thousands of dollars to find engineers with the sharpest skills, but if you use that knife for years without sharpening it, you will wind up with a useless dull knife</p> <p>If you can help them see the purpose of their work, you will see a tremendous increase in their motivation and productivity</p> <h2 id="chapter-4-dealing-with-poisonous-people">Chapter 4: Dealing with Poisonous People</h2> <p>Poisonous people: irritating</p> <ul> <li>it is the behaviour to filter out</li> <li>it is not people that is either good or bad</li> </ul> <p>Founding team need strong culture, or other cultures will overgrow it</p> <ul> <li>cultures are self selecting, nice people tend to be attracted to existing nice communities</li> </ul> <p>Threats</p> <ul> <li>team’s attention and focus</li> <li>people do not deliberately being jerks, they have issues of ignorance, apathy, rather than malice <ul> <li>do not respect other people’s time: Do not read manuals and FAQs</li> <li>incapable of accepting consensus decisions <ul> <li>will reopen discussions that have been long settled</li> </ul> </li> <li>overentitlement: keep complaining about a bug without contributing</li> <li>perfectionists: perfect is the enemy of good</li> </ul> </li> </ul> <p>Resolution</p> <ul> <li>redirect the energy of perfectionists</li> <li><strong>do not feed the troll</strong></li> <li>remember your job is to write software, not to appease every visitor <ul> <li>choose your battles carefully</li> </ul> </li> <li>look for the facts, and focus only on the facts</li> <li>look at the long term, react only if there are benefits <ul> <li>HRT culture is irreplaceable, technical contribution is replaceable</li> </ul> </li> </ul> <h2 id="chapter-5-the-art-of-organizational-manipulation">Chapter 5: The Art of Organizational Manipulation</h2> <p>Ideal</p> <ul> <li>proactive, responsibility seeking to reduce your manager’s workload</li> <li>take risks and don’t fear failure</li> <li>a good manager wants a team to push the envelope to see what they can and what they can’t do</li> <li>you can never over-communicate, don’t hesitate to update your team’s leader on what you’re doing</li> </ul> <p>Reality</p> <ul> <li>bad managers will frequently train their teams to act like children by squashing any initiative, responsibility, or rule breaking <ul> <li>fear of failure</li> <li>insecurity: make them very conservative, antithetical to the work style of engineer</li> <li>take credit for your successes and blame you for your failures</li> </ul> </li> <li>office politician: spends more time looking impactful than actually being impactful <ul> <li>route around him where possible</li> <li>don’t badmouth him to other people above him, because difficult to know who he has hoodwinked and who is wise to him</li> </ul> </li> <li>most companies are not engineering-focused <ul> <li>there are someone who is willing to sacrifice the health and sanity of the employees to meet the needs of the business</li> <li>treat engineers like slaves</li> <li>power struggles: people obsessed with titles and organizational hierarchy</li> <li>lack focus, vision, direction</li> </ul> </li> </ul> <p>Manipulating your organization</p> <ul> <li>it’s easier to ask for forgiveness than permission <ul> <li>have colleagues and friends in your company as a sounding board for your ideas</li> <li>if risky, get second opinion from a trusted source before act</li> </ul> </li> <li>make a path: get enough people to buy into your idea <ul> <li>psychology: one will give more weight to the idea when it’s hitting him from multiple directions and not just from you</li> </ul> </li> <li>it’s impossible to simply stop a bad habit, you need to replace it with a good habit</li> <li>manage upward: <strong>underpromise and overdeliver whenever possible</strong></li> </ul> <p>offensive work vs defensive work</p> <ul> <li>manage tech debt: defensive <ul> <li>aimed at the long-term health of a product</li> </ul> </li> <li>new feature: offensive <ul> <li>user-visible, shiny things</li> </ul> </li> </ul> <p><strong>Lucky people are skilled at creating and noticing chances</strong></p> <p>Powerful friends</p> <ul> <li>connectors: people who know everyone</li> <li>old-timers: carry a lot of institutional knowledge and wield a lot of influence just because they’ve been around for a long time</li> <li>administrative assistants: agents of the executives</li> </ul> <p>Effective Email</p> <ul> <li>three bullets and a call to action; nothing more</li> <li>keep low mental overhead</li> </ul> <p>Plan B:</p> <ul> <li>if you can’t change the system, there’s no point in continuing to put energy into changing it</li> <li>“If I don’t get fired, I’ve done the right thing for everyone. If I do get fired, this is the wrong employer to work for in the first place”</li> </ul> <h2 id="chapter-6-users-are-people-too">Chapter 6: Users Are People, Too</h2> <p>Three phases of user engagement</p> <ol> <li>Get users to notice your software</li> <li>what people experience when they start using your software</li> <li>how to interact productively with them once they’re firmly engaged with your creation</li> </ol> <p>Public Perception</p> <ul> <li>marketer = antithesis of engineering culture <ul> <li>engineer don’t spin our descriptions of the world</li> <li>we state facts and work to change them</li> <li>the marketing guy = lies</li> </ul> </li> <li>the marketing folks are masters of emotional manipulation</li> </ul> <p>People experience</p> <ul> <li>first impression is important <ul> <li>first minute with a product is critical</li> </ul> </li> <li>underpromise and overdeliver</li> <li>choose your audience <ul> <li>git is popular among alpha geeks for the same reasons UNIX-like OSes are <ul> <li>hard to learn but provides raw access to outrageous power</li> </ul> </li> <li>UNIX and git are counterexamples to the norm</li> </ul> </li> <li>success: measure usage by minutes, not users by head count</li> <li>speed is a feature</li> <li>some of the best software succeeds because it defines the problem narrowly and solves it well</li> <li>hide complexity <ul> <li>an elegant design makes easy things easy and hard things possible</li> <li>do not seal the software so tight that it ends up handcuffing all your users</li> </ul> </li> <li>Trust: people use your software because they want to, not because they’re trapped <ul> <li>users don’t want to participate in the analysis process, they just want to get something done</li> <li>trust is emotional, because of the cumulative set of interactions they’ve had with you</li> </ul> </li> <li>As the number of users increases, their average level of technical ability decreases</li> </ul> <p>Three points on user</p> <ul> <li>marketing: Be aware of how people perceive your software</li> <li>usability: easy to try, friendly, accessible</li> <li>customer service: proactive engagement with long-term users</li> </ul>Adrian S. Tamrighthandabacus@users.github.comTLDR: A book summarizing a lot of common sense on working in an engineering organization. It proposed the concept “HRT” – humble, respect, trust – as the general rule for dealing with people, whether they are coworkers, managers, customers.Artificial Neural Network using only numpy2019-03-20T22:08:26-04:002019-03-20T22:08:26-04:00https://www.adrian.idv.hk/pyann<p>I know there are tensorflow, pytorch, kera and a whole bunch of other libraries out there but I need something that can work on <a href="https://termux.com/">Termux</a> but no success (at least no longer after Python 3.7 upgrade). But after reading again the textbook on how a neural network operates, it doesn’t seem hard to write my own library.</p> <script src="https://gist.github.com/righthandabacus/b2f72d0aa5de61bd03e3ae996298f6d0.js"></script> <p>Here I explain the code:</p> <h2 id="symbols-and-notations">Symbols and notations</h2> <p><img src="https://upload.wikimedia.org/wikipedia/commons/3/30/Multilayer_Neural_Network.png" alt="" /></p> <p>Artificial neural network is a non-linear regression model in stacked layers. The simple regression in statistics is having one input and one output and to find the equation to fit in between. A multilayer neural network (MLP, multilayer perceptron) is to extend this structure to multiple layers, so regression on layer <script type="math/tex">n</script> gives output that will become input of the layer <script type="math/tex">n+1</script>. Input to layer 0 is the model’s input and the output from the last layer is the model’s output.</p> <p>Using a notation similar to Russell and Norvig’s book <sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, we can model the NN as follows:</p> <ul> <li>there are <script type="math/tex">N</script> layers in the NN</li> <li>model input is matrix <script type="math/tex">\mathbf{X}</script>, which the convention is to have <em>features</em> of each data point as rows, and different data are presented as columns. We will have <script type="math/tex">m\times n</script> matrix for <script type="math/tex">n</script> instances of data, each having <script type="math/tex">m</script> features</li> <li>model reference output is matrix <script type="math/tex">\mathbf{Y}</script>, which again, data of each instances are presented as columns. We will have <script type="math/tex">r\times n</script> matrix for the same <script type="math/tex">n</script> instances of data, each having <script type="math/tex">r</script> output features</li> <li><script type="math/tex">\mathbf{A}^{(\ell)}</script> is the input of layer <script type="math/tex">\ell</script> and output of layer <script type="math/tex">\ell-1</script>. It will be a matrix of dimension <script type="math/tex">s\times n</script> if there are <script type="math/tex">s</script> perceptrons on layer <script type="math/tex">\ell-1</script></li> <li><script type="math/tex">\mathbf{A}^{(0)} = \mathbf{X}</script> by definition, and we define the model output <script type="math/tex">\hat{\mathbf{y}} = \mathbf{A}^{(N)}</script></li> <li> <p>each perceptron (building block of NN) computes <script type="math/tex">z = \mathbf{w}^T\mathbf{a} + b</script> for some weight vector <script type="math/tex">\mathbf{w}</script> and the input to the layer <script type="math/tex">\mathbf{a}</script> for each instance of data, then outputs <script type="math/tex">g(z)</script> for some <em>activation function</em> <script type="math/tex">g()</script>. This is the non-linear function in the regression. In matrix form for all instances of data and the whole layer on layer <script type="math/tex">\ell</script>, it is</p> <script type="math/tex; mode=display">\mathbf{A}^{(\ell)} = g(\mathbf{Z}^{(\ell)}) = g(\mathbf{W}^{(\ell)}\mathbf{A}^{(\ell-1)} + \mathbf{b})</script> <p>where the addition of <script type="math/tex">\mathbf{b}</script> above is broadcast to each row. Matrix <script type="math/tex">\mathbf{W}^{(\ell)}</script> is of dimension <script type="math/tex">r\times s</script> for this layer has <script type="math/tex">r</script> perceptrons and the previous layer has <script type="math/tex">s</script> perceptrons. Matrices <script type="math/tex">\mathbf{A}^{(\ell)}</script> and <script type="math/tex">\mathbf{A}^{(\ell-1)}</script> is of dimensions <script type="math/tex">r\times n</script> and <script type="math/tex">s\times n</script> respectively</p> </li> <li>The activation function <script type="math/tex">g()</script> is commonly one of these: <ul> <li>ReLU <script type="math/tex">g(z) = \max(0, z)</script></li> <li>logistic: <script type="math/tex">g(z) = \frac{1}{1+e^{-z}}</script></li> <li>hyperbolic tangent: <script type="math/tex">g(z) = \tanh(z)=\frac{e^z - e^{-z}}{e^z + e^{-z}}</script></li> <li>leaky ReLU: <script type="math/tex">g(z) = az</script> for some small <script type="math/tex">a>0</script> when <script type="math/tex">% <![CDATA[ z<0 %]]></script> otherwise <script type="math/tex">g(z)=z</script></li> <li>ELU: <script type="math/tex">g(z) = a(e^z-1)</script> for some small <script type="math/tex">a>0</script> when <script type="math/tex">% <![CDATA[ z<0 %]]></script> otherwise <script type="math/tex">g(z)=z</script></li> </ul> </li> </ul> <p>To <em>train</em> the NN, we feed forward the network with data <script type="math/tex">\mathbf{X}</script> and <script type="math/tex">\mathbf{Y}</script> in each <em>epoch</em> and then use back propagation to update the parameters, then repeat for many epochs in the hope that the parameters will converge to a useful value. First we define a loss function <script type="math/tex">L(\mathbf{Y}, \hat{\mathbf{Y}})</script> to measure the average discrepancy between the NN output <script type="math/tex">\hat{\mathbf{Y}}</script> and the reference output <script type="math/tex">\mathbf{Y}</script> over the <script type="math/tex">n</script> data instances. Then we minimize <script type="math/tex">L</script>, usually by gradient descent method: On output layer:</p> <script type="math/tex; mode=display">d\mathbf{A}^{(\Lambda)} = \frac{\partial L}{\partial\hat{\mathbf{Y}}}</script> <p>Otherwise:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} d\mathbf{Z}^{(\ell)} &= d\mathbf{A}^{(\ell)}g'(\mathbf{Z}^{(\ell)}) \\ d\mathbf{W}^{(\ell)} &= \frac{\partial L}{\partial\mathbf{W}^{(\ell)}} = \frac{1}{n}d\mathbf{Z}^{(\ell)}\mathbf{A}^{(\ell-1)T} \\ d\mathbf{b}^{(\ell)} &= \frac{\partial L}{\partial\mathbf{b}^{(\ell)}} = \frac{1}{n}\sum_{i=1}^n dZ^{(\ell)}_i \\ d\mathbf{A}^{(\ell-1)} &= \frac{\partial L}{\partial\mathbf{A}^{(\ell-1)}} = \mathbf{W}^{(\ell)T}d\mathbf{Z}^{(\ell)} \end{align} %]]></script> <p>which the sum on <script type="math/tex">d\mathbf{b}^{(\ell)}</script> is to sum on all columns of <script type="math/tex">d\mathbf{Z}^{(\ell)}</script>. Then we update the parameters by</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} \mathbf{W}^{(\ell)} &:= \mathbf{W}^{(\ell)} - \alpha d\mathbf{W}^{(\ell)} \\ \mathbf{b}^{(\ell)} &:= \mathbf{b}^{(\ell)} - \alpha d\mathbf{b}^{(\ell)} \end{align} %]]></script> <p>For some learning rate <script type="math/tex">\alpha</script>. Observing the definition of each differentials, they are all partial derivatives of <script type="math/tex">L</script> w.r.t. each parameters to update. Hence the above two equations as update rule. It is common to use binary <a href="https://en.wikipedia.org/wiki/Cross_entropy">cross entropy</a> as loss function for classification applications: (in scalar form)</p> <script type="math/tex; mode=display">L(y, \hat{y}) = -y\log\hat{y} - (1-y)\log(1-\hat{y})</script> <p>which then we have</p> <script type="math/tex; mode=display">da^{(\Lambda)}_i = \frac{\partial L}{\partial\hat{y}_i} = -\frac{y_i}{\hat{y}_i} + \frac{1-y_i}{1-\hat{y}_i}</script> <h2 id="how-to-use-it">How to use it</h2> <p>Sample code:</p> <pre><code class="language-python">from pyann import pyann # make N instances of data stacked as columns of numpy array X, y = prepare_data() X_train, X_test, y_train, y_test = train_test_split(X, Y) # learn it layers = [2, 50, 50, 50, 1] activators = ["relu"] * 4 + ["logistic"] NN = pyann(layers, activators) NN.fit(X_train, y_train, 10000, 0.001, printfreq=500) # use it y_hat = NN.forward(X_test) </code></pre> <h2 id="what-can-go-wrong">What can go wrong</h2> <p>The recent O’Reilly book<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> has a very well-written Chapter 11. I would say, all problems it describes can happen to this code. So you <em>cannot</em> use it to build a deep neural network out of the box.</p> <p>First is the issue of vanishing gradients and exploding gradients. The problem will be exaggerated when the network has a lot of layers. The code above did not implement Xavier initialization (just very simple quasi-truncated normal).</p> <p>Second is the saturation of the ReLU activation functions. It is common to use ReLU, and it may saturate to its flattened region (negative <script type="math/tex">Z</script>) that render the NN malfunction. We did not implement leaky ReLU above, but we do have the exponential linear unit (ELU) with parameter 1 to the rescue. But using it will see noticeable slow down.</p> <p>Third, no regularization and no early stopping is implemented. After all, we have no way to provide test set to the NN model to fit.</p> <p>Lastly, we did not implement drop out. I heard people do not use it any more in favor of other techniques. But if you want to, we have to implement masks to the weight matrices <script type="math/tex">W</script>.</p> <p>Except that it is simple, usable, and not depend on sophisticated libraries, this is far from a feature-rich NN framework. Try at your own risk.</p> <div class="footnotes"> <ol> <li id="fn:1"> <p>Stuart Russell and Peter Norvig, Artificial Intelligence, A Modern Approach 3/e. Prentice Hall, 2010. <a href="#fnref:1" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:2"> <p>Aurelien Geron, Hands on Machine Learning with Scikit-Learn and TensorFlow. O’Reilly, 2017. <a href="#fnref:2" class="reversefootnote">&#8617;</a></p> </li> </ol> </div>Adrian S. Tamrighthandabacus@users.github.comI know there are tensorflow, pytorch, kera and a whole bunch of other libraries out there but I need something that can work on Termux but no success (at least no longer after Python 3.7 upgrade). But after reading again the textbook on how a neural network operates, it doesn’t seem hard to write my own library.System design checklist2019-03-17T20:23:10-04:002019-03-17T20:23:10-04:00https://www.adrian.idv.hk/tikz<p>This is a system design checklist I made to summarize a handful of videos I watched on the topic:</p> <p><img src="/img/system_design_checklist.png" alt="" /></p> <p>But the point I want to make is to try out TikZJax, which claims to make the TikZ drawing system to work on the browser (WebAssembly required). It worked partially, and I have no idea how to debug:</p> <script type="text/tikz"> \begin{tikzpicture} \draw (0,5) node[draw,circle,minimum size=2cm,align=center] (case) {use\\ \textcolor{red}{C}ase}; \node (case1) at ++(90:1cm) {who use it}; \node (case2) at ++(135:1cm) {how to use}; \node[align=center] (case3) at ++(180:1cm) {what is\\ forbidden}; \draw (5,5) node[draw,circle,minimum size=2cm] (constr) {\textcolor{red}{C}onstraints}; \node (constr1) at ++(180:1cm) {logging}; \node (constr2) at ++(135:1cm) {latency}; \node (constr3) at ++(90:1cm) {availability}; \node (constr4) at ++(45:1cm) {consistency}; \node (constr5) at ++(0:1cm) {security}; \draw (10,5) node[draw,circle,minimum size=2cm,align=center] (calcu) {capacity\\ \textcolor{red}{C}alculation}; \node[align=center] (calcu1) at ++(90:1cm) {cache\\ memory}; \node[align=center] (calcu2) at ++(45:1cm) {network\\ throughput}; \node (calcu3) at ++(0:1cm) {storage}; \node (calcu4) at ++(315:1cm) {growth}; \node[draw,circle,minimum size=2cm] (design) at (0,0) {\textcolor{red}{D}esign}; \node (design1) at ++(270:1cm) {API design}; \node (design2) at ++(225:1cm) {DB schema}; \node[draw,circle,minimum size=2cm,align=center] (diagram) at (5,0) {Block\\ \textcolor{red}{D}iagram}; \node (diagram1) at ++(0:20mm) {DB tier}; \node[align=center] (diagram2) at ++(72:20mm) {reverse\\ proxy}; \node (diagram7) at ++(36:20mm) {storage}; \node (diagram4) at ++(108:20mm) {app tier}; \node[align=center] (diagram5) at ++(144:20mm) {web tier\\ MVC}; \node (diagram6) at ++(180:20mm) {ZooKeeper}; \node[align=center] (diagram3) at ++(216:20mm) {load balancer\\ gateway}; \node (diagram8) at ++(252:20mm) {CDN}; \node[align=center] (diagram9) at ++(288:20mm) {in-mem\\ DB}; \node (diagram0) at ++(324:20mm) {task queue}; \node[draw,circle,minimum size=2cm] (detail) at (10,0) {\textcolor{red}{D}etails}; \node (detail1) at ++(90:1cm) {algorithm}; \node (detail2) at ++(45:1cm) {DB sharding}; \node[align=center] (detail3) at ++(0:1cm) {load\\ balancing}; \node (detail4) at ++(315:1cm) {caching}; \node[align=center] (detail5) at ++(270:1cm) {operation\\ (cleanup, backup, etc)}; \draw (case1) -- (case); \draw (case2) -- (case); \draw (case3) -- (case); \draw (constr1) -- (constr); \draw (constr2) -- (constr); \draw (constr3) -- (constr); \draw (constr4) -- (constr); \draw (constr5) -- (constr); \draw (calcu1) -- (calcu); \draw (calcu2) -- (calcu); \draw (calcu3) -- (calcu); \draw (calcu4) -- (calcu); \draw (design1) -- (design); \draw (design2) -- (design); \draw (diagram1) -- (diagram); \draw (diagram2) -- (diagram); \draw (diagram3) -- (diagram); \draw (diagram4) -- (diagram); \draw (diagram5) -- (diagram); \draw (diagram6) -- (diagram); \draw (diagram7) -- (diagram); \draw (diagram8) -- (diagram); \draw (diagram9) -- (diagram); \draw (diagram0) -- (diagram); \draw (detail1) -- (detail); \draw (detail2) -- (detail); \draw (detail3) -- (detail); \draw (detail4) -- (detail); \draw (detail5) -- (detail); \end{tikzpicture} </script> <p>The above is generated using the code below, which works fine in LaTeX command line but the relative coordinate system are treated as absolute coordinates in the WebAssembly version:</p> <pre><code>\begin{tikzpicture} \draw (0,5) node[draw,circle,minimum size=2cm,align=center] (case) {use\\ \textcolor{red}{C}ase}; \node (case1) at ++(90:1cm) {who use it}; \node (case2) at ++(135:1cm) {how to use}; \node[align=center] (case3) at ++(180:1cm) {what is\\ forbidden}; \draw (5,5) node[draw,circle,minimum size=2cm] (constr) {\textcolor{red}{C}onstraints}; \node (constr1) at ++(180:1cm) {logging}; \node (constr2) at ++(135:1cm) {latency}; \node (constr3) at ++(90:1cm) {availability}; \node (constr4) at ++(45:1cm) {consistency}; \node (constr5) at ++(0:1cm) {security}; \draw (10,5) node[draw,circle,minimum size=2cm,align=center] (calcu) {capacity\\ \textcolor{red}{C}alculation}; \node[align=center] (calcu1) at ++(90:1cm) {cache\\ memory}; \node[align=center] (calcu2) at ++(45:1cm) {network\\ throughput}; \node (calcu3) at ++(0:1cm) {storage}; \node (calcu4) at ++(315:1cm) {growth}; \node[draw,circle,minimum size=2cm] (design) at (0,0) {\textcolor{red}{D}esign}; \node (design1) at ++(270:1cm) {API design}; \node (design2) at ++(225:1cm) {DB schema}; \node[draw,circle,minimum size=2cm,align=center] (diagram) at (5,0) {Block\\ \textcolor{red}{D}iagram}; \node (diagram1) at ++(0:20mm) {DB tier}; \node[align=center] (diagram2) at ++(72:20mm) {reverse\\ proxy}; \node (diagram7) at ++(36:20mm) {storage}; \node (diagram4) at ++(108:20mm) {app tier}; \node[align=center] (diagram5) at ++(144:20mm) {web tier\\ MVC}; \node (diagram6) at ++(180:20mm) {ZooKeeper}; \node[align=center] (diagram3) at ++(216:20mm) {load balancer\\ gateway}; \node (diagram8) at ++(252:20mm) {CDN}; \node[align=center] (diagram9) at ++(288:20mm) {in-mem\\ DB}; \node (diagram0) at ++(324:20mm) {task queue}; \node[draw,circle,minimum size=2cm] (detail) at (10,0) {\textcolor{red}{D}etails}; \node (detail1) at ++(90:1cm) {algorithm}; \node (detail2) at ++(45:1cm) {DB sharding}; \node[align=center] (detail3) at ++(0:1cm) {load\\ balancing}; \node (detail4) at ++(315:1cm) {caching}; \node[align=center] (detail5) at ++(270:1cm) {operation\\ (cleanup, backup, etc)}; \draw (case1) -- (case); \draw (case2) -- (case); \draw (case3) -- (case); \draw (constr1) -- (constr); \draw (constr2) -- (constr); \draw (constr3) -- (constr); \draw (constr4) -- (constr); \draw (constr5) -- (constr); \draw (calcu1) -- (calcu); \draw (calcu2) -- (calcu); \draw (calcu3) -- (calcu); \draw (calcu4) -- (calcu); \draw (design1) -- (design); \draw (design2) -- (design); \draw (diagram1) -- (diagram); \draw (diagram2) -- (diagram); \draw (diagram3) -- (diagram); \draw (diagram4) -- (diagram); \draw (diagram5) -- (diagram); \draw (diagram6) -- (diagram); \draw (diagram7) -- (diagram); \draw (diagram8) -- (diagram); \draw (diagram9) -- (diagram); \draw (diagram0) -- (diagram); \draw (detail1) -- (detail); \draw (detail2) -- (detail); \draw (detail3) -- (detail); \draw (detail4) -- (detail); \draw (detail5) -- (detail); \end{tikzpicture} </code></pre>Adrian S. Tamrighthandabacus@users.github.comThis is a system design checklist I made to summarize a handful of videos I watched on the topic:Tic-tac-toe using AI from the last century2019-03-15T19:28:28-04:002019-03-15T19:28:28-04:00https://www.adrian.idv.hk/tictactoe<p>I want to watch the computer play a game with itself. So I pick the easiest game, tic-tac-toe, and see how well the computer can play. Tic-tac-toe is never an interesting game. Any sensible human playing it will make a draw. So if computer is smart enough, it should also make a draw.</p> <h2 id="skeleton-of-self-play-engine">Skeleton of self-play engine</h2> <p>We start with the skeleton. A game that computer play against itself is simpler as I do not need to implement the user interface to obtain human input. So this will be a very simple loop:</p> <pre><code class="language-python">def play(): game = Board() player = 'X' while not game.won(): opponent = 'O' if player == 'X' else 'X' game = move(game, player) print("%s move:" % player) print(game) player = opponent winner = game.won() print() if not winner: print("Tied") else: print("%s has won" % winner) </code></pre> <p>But now we need to create a board representation, and some checker to verify the game is over. To keep the board position, we can simply use a 2D array. To check whether we have a winner, we need to check all possibilities of a tic-tac-toe winning. And to determine if there is a tie, we need to verify that the board is full. So here is the board class:</p> <pre><code class="language-python">import copy class Board: """simple tic-tac-toe board""" def __init__(self, board=None): if board: self.board = copy.deepcopy(board) else: self.board = [[' '] * 3 for _ in range(3)] def place(self, row, col, what): """produce a new board with row and col set to a symbol. Return None if some symbol already set.""" if self.board[row][col] == ' ': newboard = Board(self.board) newboard[row][col] = what return newboard def __getitem__(self, key): return self.board[key] def __repr__(self): separator = "\n---+---+---\n " return " " + separator.join([" | ".join(row) for row in self.board]) def spaces(self): """tell how many empty spots on the board""" return sum(1 for i in range(3) for j in range(3) if self[i][j] == ' ') def won(self): """check winner. Return the winner's symbol or None""" # check rows for row in self.board: if row != ' ' and all(c==row for c in row): return row # check cols for n in range(3): if self.board[n] != ' ' and all(self.board[i][n] == self.board[n] for i in range(3)): return self.board[n] # check diag if self.board != ' ' and all(self.board[n][n] == self.board for n in range(3)): return self.board if self.board != ' ' and all(self.board[n][2-n] == self.board for n in range(3)): return self.board </code></pre> <p>We can now verify the board works by making a human playing game:</p> <pre><code class="language-python">def play(): "auto play tic-tac-toe" game = Board() player = 'X' # loop until the game is done print(game) while not game.won(): opponent = 'O' if player == 'X' else 'X' while True: userin = input("Player %s, input coordinate (0-2, 0-2):" % player) nums = "".join(c if c.isdigit() else ' ' for c in userin).split() if len(nums) != 2: continue nums = [int(n) for n in nums] if not all(0 &lt;= n &lt;= 2 for n in nums): continue nextstep = game.place(nums, nums, player) if nextstep: game = nextstep break print() print("%s move:" % player) print(game) player = opponent winner = game.won() print() if not winner: print("Tied") else: print("%s has won" % winner) </code></pre> <p>and we can see it really works:</p> <pre><code> | | ---+---+--- | | ---+---+--- | | Player X, input coordinate (0-2, 0-2):1,1 X move: | | ---+---+--- | X | ---+---+--- | | Player O, input coordinate (0-2, 0-2):1,1 Player O, input coordinate (0-2, 0-2):0,2 O move: | | O ---+---+--- | X | ---+---+--- | | Player X, input coordinate (0-2, 0-2):0,3 Player X, input coordinate (0-2, 0-2):1,2 X move: | | O ---+---+--- | X | X ---+---+--- | | Player O, input coordinate (0-2, 0-2):0,0 O move: O | | O ---+---+--- | X | X ---+---+--- | | Player X, input coordinate (0-2, 0-2):1,0 X move: O | | O ---+---+--- X | X | X ---+---+--- | | X has won </code></pre> <h2 id="first-step-of-ai-game-tree-search">First step of AI: Game tree search</h2> <p>The old school way of doing AI on such board game is to do a game tree search. The board class above actually prepared for that. Imagine we have a position and now is a particular player’s turn. All possible positions can be generated as follows:</p> <pre><code class="language-python">next_steps = filter(None, [game.place(r, c, player) for r in range(3) for c in range(3)]) </code></pre> <p>This will contain only the legitimate next step positions, i.e., we place at a empty box only. Recursively we do this for each position, we generated a game tree, with a depth of 9 (because we have 9 spots to play). Below is an illustration from Wikipedia:</p> <p><img src="https://upload.wikimedia.org/wikipedia/commons/d/da/Tic-tac-toe-game-tree.svg" alt="" /></p> <p>The goal of playing the game is to win. If we at the <em>leaf nodes</em> of the tree, we can determine if a player has won, lost, or it is a draw. So basically we can make a evaluation function to score the state of the end game:</p> <pre><code class="language-python">def evaluate(board): """simple evaluator: +10, -10 for someone won, 0 for tie, None for all other""" winner = board.won() if winner == "X": return 10 elif winner == "O": return -10 if not board.spaces(): return 0 </code></pre> <p>Now we move on to the core of searching the game tree. The idea is that every player will play to her advantage. If there are two moves that one is sure to loss, the player must avoid it. It will be trivial if it is one level above a leaf node of the game tree. Thus at that level, we can find the worst possible outcome and the player suppose to <em>minimize the worst possible score</em>. Then we have the <em>minimax algorithm</em>: At each turn, players are to minimize the maximum loss. And the loss is computed recursively until the leaf node. So here we have our code:</p> <pre><code class="language-python">COUNT = 0 PLAYERS = ["X", "O"] def minimax(board, player): """player to move one step on the board, find the minimax (best of the worse case) score""" global COUNT COUNT += 1 opponent = "O" if player == "X" else "X" value = evaluate(board) if value is not None: return value # exact score of the board, at leaf node # possible opponent moves: The worse case scores in different options candscores = [minimax(b, opponent) for b in [board.place(r, c, player) for r in range(3) for c in range(3)] if b] # evaluate the best of worse case scores if player == "X": return max(candscores) else: return min(candscores) def play(): "auto play tic-tac-toe" global COUNT minimizer = True game = Board() # loop until the game is done while not game.won(): player = PLAYERS[minimizer] opponent = PLAYERS[not minimizer] COUNT = 0 candidates = [(b, minimax(b, opponent)) for b in [game.place(r, c, player) for r in range(3) for c in range(3)] if b] if not candidates: break random.shuffle(candidates) # find best move: optimizing the worse case score if player == "X": game = max(candidates, key=lambda pair: pair) else: game = min(candidates, key=lambda pair: pair) # print board and switch minimizer = not minimizer print() print("%s move after %d search steps:" % (player, COUNT)) print(game) winner = game.won() print() if not winner: print("Tied") else: print("%s has won" % winner) </code></pre> <p>We defined the evaluation function in such a way that “X” win will have a positive score and “O” win will have a negative score. Therefore player “X” will try to maximize the score while player “O” will try to minimize it. Hence we call them the maximizer and minimizer respectively and the node on the game tree that player “X” to move the <em>maximizer node</em> and the otherwise the <em>minimizer node</em>.</p> <p>In the functions above, the maximizer is to maximize the potential score among all possible next steps, and vice versa. The function <code>minimax()</code> is to find such minimax score of a player. So when it is the maximizer, we compute the minimax score of the minimizer on each next step positions. The <code>minimax()</code> function in turn, compute using the next step positions of the minimizer and recursively in a similar fashion until the leaf nodes. The game in this form goes like the following:</p> <pre><code>O move after 549945 search steps: | | ---+---+--- | | O ---+---+--- | | X move after 63904 search steps: | | X ---+---+--- | | O ---+---+--- | | O move after 8751 search steps: | | X ---+---+--- | | O ---+---+--- O | | X move after 1456 search steps: X | | X ---+---+--- | | O ---+---+--- O | | O move after 205 search steps: X | O | X ---+---+--- | | O ---+---+--- O | | X move after 60 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O move after 13 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O X move after 4 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | X | O O move after 1 search steps: X | O | X ---+---+--- O | X | O ---+---+--- O | X | O Tied </code></pre> <p>The code above intentionally keep a counter <code>COUNT</code> to see how efficient the game will be. And we randomize the possible moves at each step to work around the issue of multiple possible next steps having same minimax score. Indeed, the game in this form is really slow. One way to see it, a tic-tac-toe game has 9 boxes and each box can either be “X”, “O”, or blank. So there can only be <script type="math/tex">3^9 = 19683</script> possible positions on the board. But we searched 549945 positions on the first move. This is because we searched a lot of duplicated positions — as the same position can be reached by different combination of moves, and the nodes on the game tree have a lot of repetitions.</p> <h2 id="alpha-beta-pruning">Alpha-beta pruning</h2> <p>The game tree of a game as simple as tic-tac-toe can have orders of magnitude more nodes than the possible positions in the game. If we work on a more complicated game, the game tree can easily go intractable. Therefore, we should avoid searching the whole game tree.</p> <p>Half a century ago, people invented the <em>alpha-beta pruning</em> algorithm to avoid the part of the game tree that is know to be not interesting. The idea is not hard to understand: Imagine it is on a maximizer node, and we have a number of possible next moves. We check one by one for the minimax score and get some idea of what we can do. So on the first next move, we evaluate the minimax score on behalf of a minimizer. On the second next move, we expect a higher score than what we got from the previous evaluation. However, as a minimizer, it will prefer the lower score. So we can let the minimax function know that whenever we find the minimizer saw an option of score lower than this previous score, we can stop (<em>prune the game tree</em>) on this minimizer node – since this minimizer node will never be an option for the maximizer node one level above. Similar idea for searching on a minimizer node. Recursively on this idea, we have the alpha-beta search.</p> <p>Implementing this idea:</p> <pre><code class="language-python">def alphabeta(board, player, alpha=-float("inf"), beta=float("inf")): """minimax with alpha-beta pruning. It implies that we expect the score should between lowerbound alpha and upperbound beta to be useful """ global COUNT COUNT += 1 opponent = "O" if player == "X" else "X" value = evaluate(board) if value is not None: return value # exact score of the board (terminal nodes) # minimax search with alpha-beta pruning children = filter(None, [board.place(r, c, player) for r in range(3) for c in range(3)]) if player == "X": # player is maximizer value = -float("inf") for child in children: value = max(value, alphabeta(child, opponent, alpha, beta)) alpha = max(alpha, value) if alpha &gt;= beta: break # beta cut-off else: # player is minimizer value = float("inf") for child in children: value = min(value, alphabeta(child, opponent, alpha, beta)) beta = min(beta, value) if alpha &gt;= beta: break # alpha cut-off return value </code></pre> <p>As a convention, we call the lower bound and upper bound of the minimax score as we learned so far as <script type="math/tex">\alpha</script> and <script type="math/tex">\beta</script> respectively. They are initially at negative and positive infinity and narrowed down as the alpha-beta search proceeds – we move up the lower bound on maximizer nodes and move down the upper bound on minimizer nodes, as this is the idea of what minimax is about. Whenever we have <script type="math/tex">\alpha \gt \beta</script>, we can prune the branch. We call the pruning at maximizer node the beta cut-off and the one at minimizer node the alpha cut-off.</p> <p>Running this to replace the previous <code>minimax()</code> function will be much faster, as less nodes are searched:</p> <pre><code>O move after 30709 search steps: | | ---+---+--- | | O ---+---+--- | | X move after 9785 search steps: | | X ---+---+--- | | O ---+---+--- | | O move after 1589 search steps: | | X ---+---+--- | | O ---+---+--- O | | X move after 560 search steps: X | | X ---+---+--- | | O ---+---+--- O | | O move after 121 search steps: X | O | X ---+---+--- | | O ---+---+--- O | | X move after 53 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O move after 13 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O X move after 4 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | X | O O move after 1 search steps: X | O | X ---+---+--- O | X | O ---+---+--- O | X | O Tied </code></pre> <h2 id="performance-improvement">Performance improvement</h2> <p>There are a few areas we can improve the program to make it faster.</p> <p>Firstly, we modify the <code>Board</code> class, as below. It will be very useful later. We do not want to use 2D array any more. Instead, we use a bitboard – using a bit vector to represent the board position. As there are two players and nine boxes, we can use 18 bits to represent all positions, the 9 MSB for player “X” and the 9 LSB for player “O”. It will be less convenient when we want to mark a box but in return, handing a single integer is much faster than a 2D array.</p> <p>Secondly, we use +1 and -1 instead of “X” and “O” in the code as we are now using bitboard. We convert them into symbols only when we need to print it. The benefit of this is that we are now easier to distinguish maximizer and minimizer – by comparing the sign.</p> <pre><code class="language-python">from gmpy import popcount PLAYERS = [1, -1] # maximizer == 1 COORDS = [(r,c) for r in range(3) for c in range(3)] def symbol(code): """Return the symbol of player""" assert code in PLAYERS return "X" if code == 1 else "O" def grouper(iterable, n, fillvalue=None): # https://docs.python.org/3.7/library/itertools.html args = [iter(iterable)] * n return itertools.zip_longest(*args, fillvalue=fillvalue) class Board: """bit-vector based tic-tac-toe board""" def __init__(self, board=0): self.board = board def mask(self, row, col, who): """Produce the bitmask for row and col The 18-bit vector is row-major, with matrix cell (0,0) the MSB. And the higher 9-bit is for 1 (X) and lower 9-bit is for -1 (O) Args: row, col: integers from 0 to 2 inclusive """ offset = 3*(2-row) + (2-col) if who == 1: offset += 9 return 1 &lt;&lt; offset def place(self, row, col, what: int): """produce a new board with row and col set to a symbol. Return None if some symbol already set. Args: what: either +1 or -1 """ assert what in PLAYERS mask = self.mask(row, col, what) checkmask = self.mask(row, col, -what) if (mask | checkmask) &amp; self.board: return None # something already on this box return Board(self.board | mask) def __repr__(self): def emit(): omask = 1 &lt;&lt; 8 xmask = omask &lt;&lt; 9 while omask: # until the mask becomes zero yield "O" if self.board &amp; omask else "X" if self.board &amp; xmask else " " omask &gt;&gt;= 1 xmask &gt;&gt;= 1 separator = "\n---+---+---\n " return " " + separator.join(" | ".join(g) for g in grouper(emit(), 3)) def spaces(self): """tell how many empty spots on the board""" # alternative if no gmpy: bit(self.board).count("1") return 9 - popcount(self.board) masks = (0b000000111, 0b000111000, 0b111000000, # rows 0b001001001, 0b010010010, 0b100100100, # cols 0b100010001, 0b001010100 # diags ) def won(self): """check winner. Return the winner's symbol or None""" shifted = self.board &gt;&gt; 9 for mask in self.masks: if self.board &amp; mask == mask: return -1 if shifted &amp; mask == mask: return 1 </code></pre> <p>In <code>spaces()</code> function above, we use the popcount function from <a href="https://pypi.org/project/gmpy/">gmpy</a> as it is native and fast. Otherwise we can use the function below as alternative:</p> <pre><code class="language-python">def popcount(n): return bin(n).count("1") </code></pre> <p>Secondly, we can consider memoize the minimax functions. In AI literature, this is called the transposition table. This is possible because our minimax function is deterministic and depends only on the board position and the player. It will be harder if the function also depends on the depth of the game tree (which is usually the case of chess) and the evaluation result is not deterministic (e.g., depends on some heuristic or some guesswork involved). Simple as this can greatly improve performance even on a full game tree search:</p> <pre><code class="language-python">CACHE = {} COUNT = 0 def simple_minimax(board, player); """player to move one step on the board, find the minimax (best of the worse case) score""" # check cache for quick return if (board.board, player) in CACHE: return CACHE[(board.board, player)] global COUNT COUNT += 1 opponent = -player value = evaluate(board) if value is not None: return value # exact score of the board # possible opponent moves: The worse case scores in different options candscores = [simple_minimax(b, opponent) for b in [board.place(r, c, player) for r, c in COORDS] if b] # evaluate the best of worse case scores if player == 1: value = max(candscores) else: value = min(candscores) # save into cache CACHE[(board.board, player)] = value return value </code></pre> <p>Here we see why a bitboard is beneficial: It is much easier to use two integers as the key to the dictionary <code>CACHE</code>. The performance improvement is significant:</p> <pre><code>O move after 7381 search steps: | | ---+---+--- | | O ---+---+--- | | X move after 0 search steps: | | X ---+---+--- | | O ---+---+--- | | O move after 0 search steps: | | X ---+---+--- | | O ---+---+--- O | | X move after 0 search steps: X | | X ---+---+--- | | O ---+---+--- O | | O move after 0 search steps: X | O | X ---+---+--- | | O ---+---+--- O | | X move after 0 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O move after 0 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O X move after 0 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | X | O O move after 1 search steps: X | O | X ---+---+--- O | X | O ---+---+--- O | X | O Tied </code></pre> <p>Thirdly, there are some standard practice to improve alpha beta search. Two of them are the <em>heuristic improvement</em> and <em><a href="https://en.wikipedia.org/wiki/Killer_heuristic">killer heuristic</a></em>.</p> <p>The heuristic improvement means to reorder the children of a node before doing the alpha beta search. Remember that alpha beta search checks one child node at a time and narrow the bounds iteratively. If we have the best option as the first child, it can make the pruning more often and thus faster in the search.</p> <p>Killer heuristic is having a similar idea: If certain move caused pruning in the past, it is believed that the same move will cause pruning again in another similar position.</p> <p>For the former, it is a bit of an art. Indeed, a lot of research have been done to find the better evaluation function for positions of a particular game. If we have a universally correct evaluation function that can tell whether one position is better than another, we do not even need to do a game tree search but rather, just pick the best next step every time according to this function. Fortunately tic-tac-toe is a game simple enough that we have such function:</p> <pre><code class="language-python">def heuristic_evaluate(board): """heuristic evaluation from &lt;http://www.ntu.edu.sg/home/ehchua/programming/java/javagame_tictactoe_ai.html&gt;""" score = 0 for mask in Board.masks: # 3-in-a-row == score 100 # 2-in-a-row == score 10 # 1-in-a-row == score 1 # 0-in-a-row, or mixed entries == score 0 (no chase for either to win) # X == positive, O == negative oboard = board.board xboard = oboard &gt;&gt; 9 countx = popcount(xboard &amp; mask) counto = popcount(oboard &amp; mask) if countx == 0: score -= int(10**(counto-1)) elif counto == 0: score += int(10**(countx-1)) return score </code></pre> <p>The latter do not need the great mind to craft such artistic function. We just need to remember what caused the last cut-off. Research has shown that using last two cut-off instead of one has a better performance (power of two random choice?) Thus we can use a <code>deque()</code> to implement the memory.</p> <p>These two techniques are implemented in the alpha beta search as below. We can modify the condition on the <code>if</code> statements to turn on or turn off those techniques:</p> <pre><code class="language-python">KILLERS = deque() def alphabeta(board, player, alpha=-float("inf"), beta=float("inf")): """minimax with alpha-beta pruning. It implies that we expect the score should between lowerbound alpha and upperbound beta to be useful """ if False and "Use cache": # make alpha-beta with memory: interferes with killer heuristics if (board.board, player) in CACHE: return CACHE[(board.board, player)] global COUNT COUNT += 1 assert player in PLAYERS opponent = -player value = evaluate(board) if value is not None: return value # exact score of the board (terminal nodes) # minimax search with alpha-beta pruning masks = filter(None, [board.check(r, c, player) for r,c in COORDS]) children = [(mask, board.place(mask)) for mask in masks] if False and "Heuristic improvement": # sort by a heuristic function to hint for earlier cut-off children = sorted(children, key=heuristic_evaluate, reverse=True) if "Killer heuristic": # remember the move that caused the last (last 2) beta cut-off and check those first # &lt;https://en.wikipedia.org/wiki/Killer_heuristic&gt; children = sorted(children, key=lambda x: x not in KILLERS) if player == 1: # player is maximizer value = -float("inf") for mask, child in children: value = max(value, alphabeta(child, opponent, alpha, beta)) alpha = max(alpha, value) if alpha &gt;= beta: KILLERS.append(mask) if len(KILLERS) &gt; 4: KILLERS.popleft() break # beta cut-off else: # player is minimizer value = float("inf") for _, child in children: value = min(value, alphabeta(child, opponent, alpha, beta)) beta = min(beta, value) if alpha &gt;= beta: break # alpha cut-off # save into cache if "Use cache" == False: CACHE[(board.board, player)] = value return value </code></pre> <p>For the game as simple as tic-tac-toe, these improvements are, unfortunately, not significant in time saving. But it is proved to be effective in larger games like chess. The reason is that, the game tree of tic-tac-toe is shallow enough to make the overhead of extra work out weight their benefit. However, they are indeed obvious in making the number of nodes to search smaller. Below is the result of using only killer heuristic without memoization or heuristic improvement as in the code above. We reduced the nodes to search on first step from 30709 to 21667:</p> <pre><code>O move after 21667 search steps: | | ---+---+--- | | O ---+---+--- | | X move after 7169 search steps: | | X ---+---+--- | | O ---+---+--- | | O move after 1514 search steps: | | X ---+---+--- | | O ---+---+--- O | | X move after 532 search steps: X | | X ---+---+--- | | O ---+---+--- O | | O move after 121 search steps: X | O | X ---+---+--- | | O ---+---+--- O | | X move after 53 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O move after 13 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | | O X move after 4 search steps: X | O | X ---+---+--- | X | O ---+---+--- O | X | O O move after 1 search steps: X | O | X ---+---+--- O | X | O ---+---+--- O | X | O Tied </code></pre> <h2 id="principal-variation-search--negascout">Principal variation search / NegaScout</h2> <p>There is yet another possible techniques to improve on alpha-beta pruning. Notice that alpha-beta pruning starts with a bound <script type="math/tex">[\alpha, \beta]</script> on expected minimax value and whenever the searched value is out of this bound, the branch is pruned. So if we have a very tight bound, we can prune more often and the game tree to search will be smaller. This is the idea of <a href="https://en.wikipedia.org/wiki/Principal_variation_search">principal variation search</a> which also comes with other names including NegaScout or MTDF(n) algorithm. Strictly speaking they should have some subtle difference in the implementation but share the same philosophy.</p> <p>So when we use this technique on a node of a game tree, we first check the first child node for a value using the ordinary alpha-beta search. Then on the subsequent child nodes, we check their with a <em>zero-window</em>. A zero-window will cause the branch to be pruned quickly or failed-high on a maximizer node (or failed-low on a minimizer node). At this latter case, we are quite sure to find a tighter bound and perform alpha-beta search again.</p> <p>This, again, pose some overhead to the game tree search as we might need to search the child node twice: once with zero window and once with a larger alpha-beta window. The implementation is as follows but it turns out, not worthwhile (either in terms of number of nodes searched or the time taken) in a shallow game tree like tic-tac-toe.</p> <pre><code class="language-python">def negascout(board, player, alpha=-float("inf"), beta=float("inf")) -&gt; float: """minimax with alpha-beta pruning. It implies that we expect the score should between lowerbound alpha and upperbound beta to be useful """ global COUNT COUNT += 1 assert player in PLAYERS opponent = -player value = evaluate(board) if value is not None: return value # exact score of the board (terminal nodes) # negascout with zero window and alpha-beta pruning masks = filter(None, [board.check(r, c, player) for r,c in COORDS]) children = [(mask, board.place(mask)) for mask in masks] # first child: alpha beta search to find value lbound/ubound bound = negascout(children, opponent, alpha, beta) if player == 1: # player is maximizer, bound is lbound if bound &gt;= beta: return bound # beta cut-off # subsequent children: zero window on lbound for mask, child in children[1:]: t = negascout(child, opponent, bound, bound+1) if t &gt; bound: # failed-high, tighter lower bound found if t &gt;= beta: bound = t else: bound = negascout(child, opponent, t, beta) # re-search for real value if bound &gt;= beta: return bound # beta cut-off else: # player is minimizer, bound is ubound if bound &lt;= alpha: return bound # alpha cut-off # subsequent children: zero window on ubound for mask, child in children[1:]: t = negascout(child, opponent, bound-1, bound) if t &lt; bound: # failed-low, tigher upper bound found if t &lt;= alpha: bound = t else: bound = negascout(child, opponent, alpha, t) # re-search for real value if bound &lt;= alpha: return bound # alpha cut-off return bound </code></pre> <h2 id="monte-carlo-tree-search">Monte-Carlo tree search</h2> <p>Above we discussed the alpha-beta search with a different variations. We made various attempt to narrow down the scope of search on the game tree.</p> <p>There is another way to save time on the search, based on a totally different idea. If we are on a node and a particular player’s turn. We still want to minimize our maximum loss. We can pretend, on each child node, how the game might proceed until the end by playing random moves and repeat for multiple times and count how often we will win or loss. Then we pick the next step that gave us least percentage of loss. This is a Monte-Carlo search on the game tree. The code is surprising simple:</p> <pre><code class="language-python">def mcts(board, player): """monte carlo tree serach Returns: the fraction of tree search that the player wins """ N = 500 # number of rounds to search count = 0 # count the number of wins for _ in range(N): step = Board(board.board) who = player while step.spaces(): r, c = random.choice(COORDS) nextstep = step.place(r, c, who) if nextstep is not None: who = -who # next player's turn step = nextstep if step.won(): # someone won break if step.won() == player: count += 1 return count / N def play(): "auto play tic-tac-toe" minimizer = True game = Board() # loop until the game is done while not game.won(): player = PLAYERS[minimizer] opponent = PLAYERS[not minimizer] candidates = [(b, mcts(b, opponent)) for b in [game.place(r, c, player) for r, c in COORDS] if b] if not candidates: break random.shuffle(candidates) # find best move: min opponent's score game, score = min(candidates, key=lambda pair: pair) # print board and switch minimizer = not minimizer print() print("%s move on score %f:" % (symbol(player), score)) print(game) winner = game.won() print() if not winner: print("Tied") else: print("%s has won" % symbol(winner)) </code></pre> <p>The <code>while</code> loop in function <code>mcts()</code> will stop only when the game end. The function counts how many times the player wins among the <code>N</code> repetitions. When we play with MCTS, we try to minimize the percentage of opponent win – and we do not have the distinction of maximizer or minimizer nodes any more.</p> <p>In a small game tree of tic-tac-toe, MCTS performs well:</p> <pre><code>O move on score 0.188000: | | ---+---+--- | O | ---+---+--- | | X move on score 0.628000: | | ---+---+--- | O | ---+---+--- | | X O move on score 0.154000: | | ---+---+--- | O | O ---+---+--- | | X X move on score 0.508000: | | ---+---+--- X | O | O ---+---+--- | | X O move on score 0.000000: | | ---+---+--- X | O | O ---+---+--- O | | X X move on score 0.338000: | | X ---+---+--- X | O | O ---+---+--- O | | X O move on score 0.000000: | | X ---+---+--- X | O | O ---+---+--- O | O | X X move on score 0.000000: | X | X ---+---+--- X | O | O ---+---+--- O | O | X O move on score 0.000000: O | X | X ---+---+--- X | O | O ---+---+--- O | O | X Tied </code></pre> <p>Of course, playing the game randomly may not be a good idea. If we know how our opponents might play each move with a probability, we can gauge the probability of move accordingly. This is indeed the idea of modern AI of game playing and finding such probability vector is the state of the art. But the above is pretty much all we have for the last century.</p> <p>Tic-tac-toe is never an interesting problem of <a href="https://xkcd.com/1002/">research</a>. Even <a href="https://xkcd.com/832/">xkcd</a> can give you a solution on how to play the game:</p> <p><img src="http://imgs.xkcd.com/comics/tic_tac_toe_large.png" alt="" /></p> <p>All code above are in the following repository: <a href="https://github.com/righthandabacus/tttai">https://github.com/righthandabacus/tttai</a></p>Adrian S. Tamrighthandabacus@users.github.comI want to watch the computer play a game with itself. So I pick the easiest game, tic-tac-toe, and see how well the computer can play. Tic-tac-toe is never an interesting game. Any sensible human playing it will make a draw. So if computer is smart enough, it should also make a draw.David et al (2016) DeepChess: End-to-end deep neural network for automatic learning in chess2019-03-13T00:00:00-04:002019-03-13T00:00:00-04:00https://www.adrian.idv.hk/dnw16-deepchess<p>Goal of the paper is to derive the evaluation function for chess from scratch using machine learning techniques. From scratch means not even input the chess rule to the evaluation function.</p> <p>The evaluation function for chess usually takes chess position as input and a score as output. As a convention, score is from white’s perspective and it is a linear combination of all selected feature from the position.</p> <p>The paper use a neural network and its training method is as follows: The model to train receives two positions as input and learns to predict which position is better (i.e., the output is binary). The data is from CCRL (www.computerchess.org.uk/ccrl). There are 640K chess games with 221695 white won and 164387 black won. The authors randomly extract 10 positions from each game which the positions are not from the first 5 moves and not a capture. Each position is converted into 773 bits:</p> <ul> <li>bitboard representation: two sides, six piece types, 64 squares = <script type="math/tex">2\times 6\times 64=768</script> bits</li> <li>additional 5 bits of state: which side to move (white = 1), ability to castle (black and white, king- and queen- side castling)</li> </ul> <p>One random position each from the game that white won and black won is paired up as training data. If there are 1M position on each side, together with swapping position of a pair, there will be 2 billion possible training data.</p> <p>The neural network has two stages: The Pos2Vec stage is a deep autoencoder network and used as nonlinear feature extractor. We expect that it converts a chess position to a vector of values of high level features. The Pos2Vec network has <em>five</em> layers of size 773-600-400-200-100 and using rectified linear unit (ReLU) with no regularization.</p> <p>The second stage is DeepChess, which is on top of two side-by-side Pos2Vec networks and output layer has 2 softmax values to predict which of the two side will win. The DeepChess network has 200 inputs (100 each from the two Pos2Vec network) and has <em>four</em> layers of size 400-200-100-2, using ReLU with no regularization, to compare the features of the positions from the two disjoint Pos2Vec networks to determine which one is better.</p> <p>The Pos2Vec network is trained for 200 epochs over 2M positions, which 1M are white win and 1M are black win. The network is trained with first layer, 773-600-773, then second layer 600-400-600, and so on to complete the five layers. The learning rate start from 0.005 and multiplied by 0.98 at the end of each epoch.</p> <p>The DeepChess network is supervised. It use previously trained Pos2Vec network as the initial weights and after adding the four layers on top of the two Pos2Vec networks, the whole network is trained again with 1M random input pairs for 100 epochs. There are 100K positions each from white won and black won to serve as validation set. Cross entropy is used. The learning rate starts from 0.01 and multiplied by 0.99 after each epoch.</p> <p>No regularization is needed as the author claims that there are orders of magnitude more potential training pairs than the one used. So we can always use new training samples in each epoch.</p> <p>The network is found to produce accuracy of 98.2% on training data and 98.0% on validation data.</p> <p>Figure 1 on page 3 of the paper shows the diagram of the network.</p> <p>The paper also proposed some possible improvements. The alternative configuration of 773-100-100-100 for Pos2Vec, and 100-100-2 instead of 4 layers for DeepChess is called network distillation (use a smaller network). This sacrificed the accuracy a bit.</p> <p>To use the neural network, we need a chess engine that does alpha-beta search. But instead of evaluating for numerical value of each position, we store the whole position in <script type="math/tex">\alpha_{pos}</script> and <script type="math/tex">\beta_{pos}</script>. At each new position, we compare it with existing <script type="math/tex">\alpha_{pos}</script> and <script type="math/tex">\beta_{pos}</script>, to check:</p> <ul> <li>if the position is better than <script type="math/tex">\alpha_{pos}</script>, it would become the new <script type="math/tex">\alpha_{pos}</script></li> <li>if the position is better than <script type="math/tex">\beta_{pos}</script>, the current node is pruned</li> </ul> <h1 id="further-reading">Further reading</h1> <ul> <li>references 3, 4, 5, 6: genetic algorithms for auto evaluation function tuning when features are initialized randomly</li> <li>reference 10: deep reinforcement learning to play chess</li> </ul>Adrian S. Tamrighthandabacus@users.github.comGoal of the paper is to derive the evaluation function for chess from scratch using machine learning techniques. From scratch means not even input the chess rule to the evaluation function.Silver et al (2017) Mastering the game of Go without human knowledge2019-03-12T00:00:00-04:002019-03-12T00:00:00-04:00https://www.adrian.idv.hk/sssahghblbclhsdgh17-alphagozero<p>This is the AlphaGo Zero paper that gave out detail on how the reinforcement learning is done.</p> <p>The predecessor AlphaGo Fan was a success on Oct 2015, so as AlphaGo Lee. It was implemented as two deep neural networks, a policy network (that give out move probabilities) and a value network (that outputs position evaluation). The policy network was trained initially by <em>supervised learning</em> to predict human expert moves, and then refined by policy gradient reinforcement learning. The value network was trained to predict the winner of the game, by playing with the policy network against itself. After the two neural networks are trained, they are combined with Monte Carlo tree search to provide a lookahead search.</p> <p>AlphaGo zero skipped the supervised learning part. It was trained solely by self-play reinforcement learning, started from random play. The result will be a single neural network instead of separated policy and value network. And it uses only black and white stones on the board as input features. When it is used, it will only be simple tree search without Monte Carlo rollouts. The reinforcement learning is performed as follows:</p> <p>Let <script type="math/tex">f_{\theta}</script> be a deep neural network with parameters <script type="math/tex">\theta</script> and it takes the raw board representation <script type="math/tex">s</script> of the position as input. The neural network outputs move probabilities and a value <script type="math/tex">(\mathbf{p}, v) = f_{\theta}(s)</script>, where:</p> <ul> <li><script type="math/tex">\mathbf{p}</script>: vector of move probabilities, which prob of selecting each move <script type="math/tex">a</script> is <script type="math/tex">p_a = \Pr[a\mid s]</script></li> <li><script type="math/tex">v</script>: scalar value, estimating the prob of current player winning from position <script type="math/tex">s</script></li> </ul> <p>At each position <script type="math/tex">s</script>, an MCTS is guided by the neural network <script type="math/tex">f_{\theta}</script> to find probabilities <script type="math/tex">\pi</script> of playing each move and value (game winner) <script type="math/tex">z</script>. <script type="math/tex">\pi</script> usually select much stronger moves than the raw move probabilities <script type="math/tex">\mathbf{p}</script>. The MCTS is a policy improvement operator and <script type="math/tex">z</script> a policy evaluation. Then we update the parameters <script type="math/tex">\theta</script> of the neural network to make <script type="math/tex">(\mathbf{p}, v) = f_{\theta}(s)</script> more closely match <script type="math/tex">(\pi, z)</script></p> <p>The MCTS always start from root state and select to move to maximize the upper confidence bound <script type="math/tex">Q(s,a)+U(s,a)</script>, where <script type="math/tex">U(s,a)\propto P(s,a)/(1+N(s,a))</script>, until leaf node <script type="math/tex">s'</script> is reached. Here,</p> <ul> <li>edge of the game tree is denoted by <script type="math/tex">(s,a)</script>, with <script type="math/tex">s</script> the board state and <script type="math/tex">a</script> the action</li> <li><script type="math/tex">P(s,a)</script> = prior probability</li> <li><script type="math/tex">N(s,a)</script> = visit count</li> <li><script type="math/tex">Q(s,a)</script> = action value</li> </ul> <p>At <script type="math/tex">s'</script>, we then evaluate the prior probabilities and evaluate <script type="math/tex">(P(s',\cdot), V(s')) = f_{\theta}(s')</script>. We then update each edge <script type="math/tex">(s,a)</script> traversed to increment its visit count <script type="math/tex">N(s,a)</script> and to update its action value</p> <script type="math/tex; mode=display">Q(s,a) = \frac{1}{N(s,a)}\sum_{s'\mid s,a\to s'} V(s')</script> <p>which the summation above is over all simulations that start from position <script type="math/tex">s</script> taking move <script type="math/tex">a</script> eventually reached <script type="math/tex">s'</script>.</p> <p>The neural network is trained as follows:</p> <p>The initial weights <script type="math/tex">\theta_0</script> are random. In each subsequent iteration <script type="math/tex">i\ge 1</script>, games of self-play, where each move is identified by subscript <script type="math/tex">t</script>, are generated. At time <script type="math/tex">t</script>, MCTS outputs <script type="math/tex">\mathbf{\pi}_t = \alpha_{\theta_{i-1}}(s_t)</script> using neural network <script type="math/tex">f_{\theta_{i-1}}</script> and moves by sampling the search probabilities <script type="math/tex">\mathbf{\pi}_t</script>. The game terminates only it exceeds a maximum length, or when both players pass. A player pass when the search value below a resignation threshold. Therefore, at each time step, we collected data <script type="math/tex">(s_t, \mathbf{\pi}_t, z_t)</script>, which <script type="math/tex">z_t \in \{-1, +1\}</script> is the game winner from the perspective of the current player at step <script type="math/tex">t</script>. The a new <script type="math/tex">\theta_i</script> is trained from <script type="math/tex">(s,\mathbf{\pi},z)</script> sampled uniformly among all time steps of iteration <script type="math/tex">i-1</script>.</p> <p>The neural network <script type="math/tex">(p,v) = f_{\theta_i}(s)</script> is adjusted to minimize the error between the predicted value <script type="math/tex">v</script> and the self-play winner <script type="math/tex">z</script>, and to maximize the similarity between vectors of move probabilities <script type="math/tex">p</script> and search probabilities <script type="math/tex">\pi</script>. It is done using gradient on loss function</p> <script type="math/tex; mode=display">\ell = (z-v)^2 - \pi^T \log p + c \lVert\theta\rVert^2</script> <p>for some parameter <script type="math/tex">c</script> of regularization to prevent overfitting. The above loss function is summing over mean-squared error and cross-entropy loss.</p> <p>Training of AlphaGo Zero: 4.9M games of self-play generated, using 1600 simulations for each MCTS, which corresponds to 0.4s thinking time per move.</p>Adrian S. Tamrighthandabacus@users.github.comThis is the AlphaGo Zero paper that gave out detail on how the reinforcement learning is done.Hypercycloid, hypertrochoid, hypocycloid, and hypotrochoid2019-03-06T20:02:47-05:002019-03-06T20:02:47-05:00https://www.adrian.idv.hk/hypocycloid<p>Hypercycloid, hypocycloid, and more general version, the hypertrochoid and hypotrochoid, are curves of the locus of a point on a circle rolling on a bigger circle. Like many other locus problems, it is convenient to tackle it from parametric equations.</p> <p>We go with the hypercycloid (aka epicycloid) first. Consider the image below from <a href="https://en.wikipedia.org/wiki/Epicycloid">Wikipedia</a>, we have a bigger circle of radius <script type="math/tex">R</script> with centre fixed at the origin. The smaller, rolling circle of radius <script type="math/tex">r</script> is rolling on the outside of the circle such that there is always a single point of intersection between the two circles. The locus of interest is drawn by point <script type="math/tex">P</script> on the smaller circle while it rolls.</p> <p><img src="https://upload.wikimedia.org/wikipedia/commons/6/61/Epizykloide_herleitung.svg" alt="" /></p> <p>Observe that when the smaller circle is rolling, its centre always follows a circle of radius <script type="math/tex">R+r</script> centred at the origin. Assume at the moment that the smaller circle rolled to the position such that its centre is at angle <script type="math/tex">\theta</script> as illustrated, the length of arc it has rolled is <script type="math/tex">R\theta</script>. This is the same length measured on the big or small circle. Assume point <script type="math/tex">P</script> is the point of contact of the two circle when <script type="math/tex">\theta=0</script>. At the moment of an unspecified <script type="math/tex">\theta</script>, the point <script type="math/tex">P</script> is at the angle of <script type="math/tex">\alpha = R\theta/r</script> relative to the point of contact of the two circle at the moment, or at the angle of <script type="math/tex">\alpha+\theta</script> relative to the <script type="math/tex">x</script> axis (such angle is measured at the third quadrant).</p> <p>Given that we have the coordinate of the centre of the smaller circle to be</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} x &= (R+r)\cos\theta \\ y &= (R+r)\sin\theta \end{align} %]]></script> <p>and the coordinate of the point <script type="math/tex">P</script> to be</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} x &= (R+r)\cos\theta - r\cos(\frac{R+r}{r}\theta) \\ y &= (R+r)\sin\theta - r\sin(\frac{R+r}{r}\theta) \end{align} %]]></script> <p>and more generally, if point <script type="math/tex">P</script> is on a circle of radius <script type="math/tex">\rho</script> eccentric to the smaller circle, then the parametric formula of the locus of the hypertrochoid (aka epitrochoid) is:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} x &= (R+r)\cos\theta - \rho\cos(\frac{R+r}{r}\theta) \\ y &= (R+r)\sin\theta - \rho\sin(\frac{R+r}{r}\theta) \end{align} %]]></script> <p>The derivation is similar if the smaller circle is rolled on the inside of the bigger circle. Except that the angle of point <script type="math/tex">P</script> relative to the <script type="math/tex">x</script> axis when the centre of the smaller circle is at angle <script type="math/tex">\theta</script> is <script type="math/tex">\alpha-\theta</script> (measured at the first quadrant), as now the point is on the clockwise side rather than counterclockwise side when the smaller circle rolled. So similarly, the parametric equation of hypocycloid is:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} x &= (R-r)\cos\theta + r\cos(\frac{R-r}{r}\theta) \\ y &= (R-r)\sin\theta - r\sin(\frac{R-r}{r}\theta) \end{align} %]]></script> <p>and the more general version, hypotrochoid, is:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align} x &= (R-r)\cos\theta + \rho\cos(\frac{R-r}{r}\theta + \phi) \\ y &= (R-r)\sin\theta - \rho\sin(\frac{R-r}{r}\theta + \phi) \end{align} %]]></script> <p>In above, we added an angle <script type="math/tex">\phi</script> to <script type="math/tex">\alpha</script> such that we allow a version rotated about the origin. The shape, however, should be just the same.</p> <p>Now some code. I like the animated GIF on wikipedia page that shows how the locus is created as the parameter <script type="math/tex">\theta</script> goes from 0 up to some big angle. Generating such animation is indeed not hard, as we already derived the coordinates and metrics of everything need to show. I will use Python, for its Pillow library is handy to create such pictures. And in addition to GIF, I can also generate animated image in Google’s WebP format. Here is the code (python 3.6+ required due to type hint syntax):</p> <script src="https://gist.github.com/righthandabacus/97dff2233b37230b7c27d5a0001586bf.js"></script> <p>and this is the command to generate a hypercycloid:</p> <pre><code>python3 hypchoid.py -q 180 hyper.webp </code></pre> <p><img src="/img/hyper.webp" alt="" /></p> <p>and this is for a hypochoid:</p> <pre><code>python3 hypchoid.py -p 50 -o hypo.webp </code></pre> <p><img src="/img/hypo.webp" alt="" /></p>Adrian S. Tamrighthandabacus@users.github.comHypercycloid, hypocycloid, and more general version, the hypertrochoid and hypotrochoid, are curves of the locus of a point on a circle rolling on a bigger circle. Like many other locus problems, it is convenient to tackle it from parametric equations.