Graph search, algorithmic optimization, and word games: How to program Ruzzle

Posted: May 20th, 2013 | Author: Alex Korn | Filed under: Tutorials | Tags: algorithms, depth-first search, graph theory, PHP | No Comments »

(Disclaimer: I have no affiliation with Ruzzle.)

Introduction

I was recently introduced to the game Ruzzle, a word finding game with some Scrabble-like scoring components. If you’re not familiar with it, check out their website and watch the quick introductory video.

Ruzzle screenshot

I was fascinated by how the game calculated all the possible words on each board. So I decided to program it. Check out the Ruzzle demo page, which creates a randomized board and finds all possible words.

Overview

The way to do this search for all possible words is by viewing the letters as a directed graph where the letters are nodes and edges are connections between adjacent letters.

Ruzzle graph. This was more of a pain to make than you might expect.

With this graph, we can do a Depth-first search (DFS) for words starting at each node.

Creating the board

First, we need to create a 4×4 board of letters. In order to create reasonable boards for English, we weight the letter distribution to that of their distribution in English.

The following code creates the graph/board, assigns letters to each node/tile, and outputs the board. The basic graph objects are shown at the end of this post.

/**
 * Create graph/board
 */
 
$timeStart = microtime(true);
 
$colCount = 4;
$rowCount = 4;
 
$lettersGraph = new Graph_Directed();
/**
 * @var array Map<colId, Map<rowId, char letter>>
 */
$lettersGraphNode = [];
for ($colIter = 0; $colIter < $colCount; ++$colIter)
{
    for ($rowIter = 0; $rowIter < $rowCount; ++$rowIter)
    {
        $node = $lettersGraphNode[$colIter][$rowIter] = new Graph_Node($lettersGraph);
    }
}
 
/**
 * Attach each node to all adjacent nodes
 */
for ($colIter = 0; $colIter < $colCount; ++$colIter)
{
    for ($rowIter = 0; $rowIter < $rowCount; ++$rowIter)
    {
        $node = $lettersGraphNode[$colIter][$rowIter];
        for ($adjacentColIter = max(0, $colIter - 1);
            $adjacentColIter < min($colCount, $colIter + 1 + 1);
            ++$adjacentColIter)
        {
            for ($adjacentRowIter = max(0, $rowIter - 1);
                $adjacentRowIter < min($rowCount, $rowIter + 1 + 1);
                ++$adjacentRowIter)
            {
                // don't connect letters to themselves
                if ($colIter == $adjacentColIter &&
                        $rowIter == $adjacentRowIter)
                {
                    continue;
                }
                $node->connectFlowFrom(
                        $lettersGraphNode[$adjacentColIter][$adjacentRowIter]);
            }
        }
    }
}
 
/**
 * @see http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_the_English_language
 */
$letterFrequency = [
    'A' => 0.08167,
    'B' => 0.01492,
    'C' => 0.02782,
    'D' => 0.04253,
    'E' => 0.12702,
    'F' => 0.02228,
    'G' => 0.02015,
    'H' => 0.06094,
    'I' => 0.06966,
    'J' => 0.00153,
    'K' => 0.00772,
    'L' => 0.04025,
    'M' => 0.02406,
    'N' => 0.06749,
    'O' => 0.07507,
    'P' => 0.01929,
    'Q' => 0.00095,
    'R' => 0.05987,
    'S' => 0.06327,
    'T' => 0.09056,
    'U' => 0.02758,
    'V' => 0.00978,
    'W' => 0.02360,
    'X' => 0.00150,
    'Y' => 0.01974,
    'Z' => 0.00074 + 0.00001, // there's some rounding in Wikipedia's list
];
$letterToMaxCumulativeFrequency = [];
$cumulativeFrequency = 0;
foreach ($letterFrequency as $letter => $frequency)
{
    $cumulativeFrequency += $frequency;
    $letterToMaxCumulativeFrequency[$letter] = $cumulativeFrequency;
}
 
foreach ($lettersGraph->getNodes() as $node)
{
    $val = mt_rand(0, mt_getrandmax()) / mt_getrandmax();
    foreach ($letterToMaxCumulativeFrequency as $letter => $maxFrequency)
    {
        if ($val < $maxFrequency)
        {
            $node->letter = $letter;
            break;
        }
    }
}
 
/**
 * @param array $lettersGraphNode
 * @return string/HC
 */
function printTileLetters(array $lettersGraphNode)
{
    $result = '<table><tbody>';
    foreach ($lettersGraphNode as $colData)
    {
        $result .= '<tr>';
        foreach ($colData as $rowData)
        {
            $result .= '<td>' . $rowData->letter . '</td>';
        }
        $result .= '</tr>';
    }
    $result .= '</tbody></table>';
    return $result;
}
 
echo printTileLetters($lettersGraphNode);

/** * Create graph/board */ $timeStart = microtime(true); $colCount = 4; $rowCount = 4; $lettersGraph = new Graph_Directed(); /** * @var array Map<colId, Map<rowId, char letter>> */ $lettersGraphNode = []; for ($colIter = 0; $colIter < $colCount; ++$colIter) { for ($rowIter = 0; $rowIter < $rowCount; ++$rowIter) { $node = $lettersGraphNode[$colIter][$rowIter] = new Graph_Node($lettersGraph); } } /** * Attach each node to all adjacent nodes */ for ($colIter = 0; $colIter < $colCount; ++$colIter) { for ($rowIter = 0; $rowIter < $rowCount; ++$rowIter) { $node = $lettersGraphNode[$colIter][$rowIter]; for ($adjacentColIter = max(0, $colIter - 1); $adjacentColIter < min($colCount, $colIter + 1 + 1); ++$adjacentColIter) { for ($adjacentRowIter = max(0, $rowIter - 1); $adjacentRowIter < min($rowCount, $rowIter + 1 + 1); ++$adjacentRowIter) { // don't connect letters to themselves if ($colIter == $adjacentColIter && $rowIter == $adjacentRowIter) { continue; } $node->connectFlowFrom( $lettersGraphNode[$adjacentColIter][$adjacentRowIter]); } } } } /** * @see http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_the_English_language */ $letterFrequency = [ 'A' => 0.08167, 'B' => 0.01492, 'C' => 0.02782, 'D' => 0.04253, 'E' => 0.12702, 'F' => 0.02228, 'G' => 0.02015, 'H' => 0.06094, 'I' => 0.06966, 'J' => 0.00153, 'K' => 0.00772, 'L' => 0.04025, 'M' => 0.02406, 'N' => 0.06749, 'O' => 0.07507, 'P' => 0.01929, 'Q' => 0.00095, 'R' => 0.05987, 'S' => 0.06327, 'T' => 0.09056, 'U' => 0.02758, 'V' => 0.00978, 'W' => 0.02360, 'X' => 0.00150, 'Y' => 0.01974, 'Z' => 0.00074 + 0.00001, // there's some rounding in Wikipedia's list ]; $letterToMaxCumulativeFrequency = []; $cumulativeFrequency = 0; foreach ($letterFrequency as $letter => $frequency) { $cumulativeFrequency += $frequency; $letterToMaxCumulativeFrequency[$letter] = $cumulativeFrequency; } foreach ($lettersGraph->getNodes() as $node) { $val = mt_rand(0, mt_getrandmax()) / mt_getrandmax(); foreach ($letterToMaxCumulativeFrequency as $letter => $maxFrequency) { if ($val < $maxFrequency) { $node->letter = $letter; break; } } } /** * @param array $lettersGraphNode * @return string/HC */ function printTileLetters(array $lettersGraphNode) { $result = '<table><tbody>'; foreach ($lettersGraphNode as $colData) { $result .= '<tr>'; foreach ($colData as $rowData) { $result .= '<td>' . $rowData->letter . '</td>'; } $result .= '</tr>'; } $result .= '</tbody></table>'; return $result; } echo printTileLetters($lettersGraphNode);

Graph search, algorithmic complexity, and optimization

First, we need to load and parse the dictionary. I’ve found this dictionary works well.

/**
 * Load/parse dictionary
 */
 
$dictionaryList = file(dirname(dirname(__FILE__)) . '/data/dictionary.txt');
$dictionaryMap = [];
foreach ($dictionaryList as $word)
{
    $dictionaryMap[strtoupper(trim($word))] = true;
}
 
/**
 * Set up first letters array, which helps us optimize the otherwise-slow
 * DFS for words.
 */
 
$dictionaryFirstNLetters = [];
define('MIN_FIRST_LETTERS_LENGTH', 4);
define('MAX_FIRST_LETTERS_LENGTH', 10);
 
foreach ($dictionaryList as $word)
{
    $word = strtoupper(trim($word));
    for ($length = MIN_FIRST_LETTERS_LENGTH; $length < MAX_FIRST_LETTERS_LENGTH; ++$length)
    {
        $firstNLetters = substr($word, 0, $length);
        if ($firstNLetters != $word)
        {
            $dictionaryFirstNLetters[$firstNLetters] = true;
        }
        else
        {
            break;
        }
    }
}
 
echo "Dictionary loaded/parsed in " . number_format(microtime(true) - $timeStart, 3) . "s<br><br>\n";

Note the variable $dictionaryFirstNLetters. We use this to optimize our DFS by allowing us to “bail out” early in our search if we see that there are no words that start with our $currentResult. If you try taking out this code, you’ll see that this algorithm can take an incredibly long time to run – even with a $maxDepth of 6 characters (only find words with up to 6 characters), the runtime can easily reach a minute. It increases quickly as $maxDepth is increased.

Below is our optimized depth-first search.

/**
 * @param Zeal_Graph_Node $source
 * @param int $maxDepth
 * @param array $dictionaryMap
 * @param array $dictionaryFirstNLetters
 * @param array $previouslyVisitedNodeIds
 * @param string $currentResult
 * @param array $results (By reference)
 * @return array
 */
function depthFirstSearch(Zeal_Graph_Node $source, $maxDepth,
        array $dictionaryMap, array $dictionaryFirstNLetters,
        $previouslyVisitedNodeIds = [], $currentResult = '', array &$results = [])
{
    if (strlen($currentResult) >= $maxDepth)
    {
        return [];
    }
    $currentResult .= $source->letter;
    if (isset($dictionaryMap[$currentResult]))
    {
        $results[] = $currentResult;
    }
    // bail out if there are no words that start with these letters
    if (strlen($currentResult) >= MIN_FIRST_LETTERS_LENGTH &&
        strlen($currentResult) <= MAX_FIRST_LETTERS_LENGTH &&
            !isset($dictionaryFirstNLetters[$currentResult]))
    {
        return [];
    }
    $previouslyVisitedNodeIds[] = $source->id();
    foreach ($source->outgoingEdges as $edge)
    {
        $destination = $edge->getNodeTo();
        // can't repeat letters
        if (in_array($destination->id(), $previouslyVisitedNodeIds))
        {
            continue;
        }
        $results += depthFirstSearch($destination, $maxDepth,
                $dictionaryMap, $dictionaryFirstNLetters, $previouslyVisitedNodeIds, $currentResult, $results);
    }
    return $results;
}
 
/**
 * Iterate over each tile and find all words starting with that letter
 */
$results = [];
for ($colIter = 0; $colIter < $colCount; ++$colIter)
{
    for ($rowIter = 0; $rowIter < $rowCount; ++$rowIter)
    {
        $results = array_merge(
            $results,
            depthFirstSearch(
                $lettersGraphNode[$colIter][$rowIter], $colCount * $rowCount,
                $dictionaryMap, $dictionaryFirstNLetters)
        );
    }
}
 
echo "<br>Stats\n<ul><li>Before uniquing: " . number_format(count($results)) . "</li>\n";
$resultsUnique = array_unique($results);
echo "<li>After uniquing: " . number_format(count($resultsUnique)) . "</li></ul>\n";
echo 'Words found<ul><li>' . implode("</li><li>\n", $resultsUnique) . '</li></ul>';
 
echo "\n" . number_format(microtime(true) - $timeStart, 2) . "s\n";

/** * @param Zeal_Graph_Node $source * @param int $maxDepth * @param array $dictionaryMap * @param array $dictionaryFirstNLetters * @param array $previouslyVisitedNodeIds * @param string $currentResult * @param array $results (By reference) * @return array */ function depthFirstSearch(Zeal_Graph_Node $source, $maxDepth, array $dictionaryMap, array $dictionaryFirstNLetters, $previouslyVisitedNodeIds = [], $currentResult = '', array &$results = []) { if (strlen($currentResult) >= $maxDepth) { return []; } $currentResult .= $source->letter; if (isset($dictionaryMap[$currentResult])) { $results[] = $currentResult; } // bail out if there are no words that start with these letters if (strlen($currentResult) >= MIN_FIRST_LETTERS_LENGTH && strlen($currentResult) <= MAX_FIRST_LETTERS_LENGTH && !isset($dictionaryFirstNLetters[$currentResult])) { return []; } $previouslyVisitedNodeIds[] = $source->id(); foreach ($source->outgoingEdges as $edge) { $destination = $edge->getNodeTo(); // can't repeat letters if (in_array($destination->id(), $previouslyVisitedNodeIds)) { continue; } $results += depthFirstSearch($destination, $maxDepth, $dictionaryMap, $dictionaryFirstNLetters, $previouslyVisitedNodeIds, $currentResult, $results); } return $results; } /** * Iterate over each tile and find all words starting with that letter */ $results = []; for ($colIter = 0; $colIter < $colCount; ++$colIter) { for ($rowIter = 0; $rowIter < $rowCount; ++$rowIter) { $results = array_merge( $results, depthFirstSearch( $lettersGraphNode[$colIter][$rowIter], $colCount * $rowCount, $dictionaryMap, $dictionaryFirstNLetters) ); } } echo "<br>Stats\n<ul><li>Before uniquing: " . number_format(count($results)) . "</li>\n"; $resultsUnique = array_unique($results); echo "<li>After uniquing: " . number_format(count($resultsUnique)) . "</li></ul>\n"; echo 'Words found<ul><li>' . implode("</li><li>\n", $resultsUnique) . '</li></ul>'; echo "\n" . number_format(microtime(true) - $timeStart, 2) . "s\n";

What’s next

This crash-course in Ruzzle programming doesn’t address the scoring, graphics, or actual gameplay. But these would all be fairly easy to add to the basic chasis that we’ve created here. I’ll leave that as an exercise to the reader 🙂

Reference

The graph code I use is straightforward, and given below.

Directed graph

/**
 * A directed graph.
 * - http://en.wikipedia.org/wiki/Directed_graph
 */
class Graph_Directed
{
    /**************************************************************************
     * @section ID generators
     */
 
    /**
     * @var int
     */
    protected $_maxNodeId = 0;
    /**
     * @param Graph_Node $node
     * @return int
     */
    public final function getNextNodeId(Graph_Node $node)
    {
        ++$this->_maxNodeId;
        $this->_nodes[$this->_maxNodeId] = $node;
        return $this->_maxNodeId;
    }
 
    /**
     * @var int
     */
    protected $_maxEdgeId = 0;
    /**
     * @param Graph_Edge $edge
     * @return int
     */
    public final function getNextEdgeId(Graph_Edge $edge)
    {
        ++$this->_maxEdgeId;
        if (!($edge instanceof ipGraph_EdgeResidual))
        {
            $this->_edges[$this->_maxEdgeId] = $edge;
        }
        return $this->_maxEdgeId;
    }
 
    /**************************************************************************
     * @section Edges, Nodes getters
     */
 
    /**
     * @var array Map<int id, Zeal_Graph_Node> $_nodes
     */
    protected $_nodes = [];
    /**
     * Returns nodes
     * @param none
     * @return Map<int id, Zeal_Graph_Node>
     */
    public final function getNodes()
    {
        return $this->_nodes;
    }
    /**
     * @param none
     * @return int 
     */
    public final function getNodeCount()
    {
        return count($this->_nodes);
    }
 
    /**
     * @var array Map<int id, Graph_Edge $_edges
     */
    protected $_edges = [];
    /**
     * @param none
     * @return array Map<edgeId, Graph_Edge> 
     */
    public final function getEdges()
    {
        return $this->_edges;
    }
    /**
     * @param none
     * @return int
     */
    public final function getEdgeCount()
    {
        return count($this->_edges);
    }
 
}

Node

/**
 * A node/vertex in a graph.
 * http://en.wikipedia.org/wiki/Graph_(mathematics)
 */
class Graph_Node
{
    /**
     * @var Graph_Directed The graph to which this node belongs
     */
    protected $_graph;
 
    /**
     * @param Graph_Directed $graph 
     */
    public function __construct(Graph_Directed $graph)
    {
        $this->_id = $graph->getNextNodeId($this);
        $this->_graph = $graph;
    }
 
    /**
     * @var int Unique across the parent Graph_Directed
     */
    protected $_id;
    /**
     * @param none
     * @return int
     */
    public function id()
    {
        return $this->_id;
    }
 
    /**
     * @param Graph_Node $node
     * @return Graph_Edge
     */
    public function connectFlowFrom(Graph_Node $node)
    {
        $edge = new Graph_Edge($this->_graph, $node, $this);
        return $edge;
    }
 
}

Edge

/**
 * A directed edge in a directed graph.
 */
class Graph_Edge
{
    /**
     * @var int Unique across the graph
     */
    protected $_id;
 
    /**
     * @var Graph_Node $_nodeFrom
     */
    protected $_nodeFrom;
    /**
     * @var Graph_Node $_nodeTo
     */
    protected $_nodeTo;
 
    /**
     * @var Graph_Directed
     */
    protected $_graph;
 
    /**
     * @param Graph_Directed $graph
     * @param Graph_Node $nodeFrom
     * @param Graph_Node $nodeTo
     */
    public function __construct(Graph_Directed $graph,
            Graph_Node $nodeFrom, Graph_Node $nodeTo)
    {
        $this->_id = $graph->getNextEdgeId($this);
        $this->_graph = $graph;
        $this->_nodeFrom = $nodeFrom;
        $nodeFrom->outgoingEdges[] = $this;
        $this->_nodeTo = $nodeTo;
    }
 
    /**
     * @param none
     * @return int
     */
    public function id()
    {
        return $this->_id;
    }
 
    /**
     * @param none
     * @return Graph_Node
     */
    public function getNodeTo()
    {
        return $this->_nodeTo;
    }
 
    /**
     * @param none
     * @return Graph_Node
     */
    public function getNodeFrom()
    {
        return $this->_nodeFrom;
    }
 
    /**
     * @param none
     * @return Graph_Directed
     */
    public function getGraph()
    {
        return $this->_graph;
    }
 
}

Feel free to ask questions or give feedback via Twitter.

Alex Korn