Hand-rolling valid JSON in MySQL using GROUP_CONCAT

Posted: May 12th, 2015 | Author: | Filed under: Tutorials | Tags: , , , , , | No Comments »

The problem

Recently, I came across what started as a straightforward problem: In MySQL I was JOINing from Table A to Table B on a’s primary key, and wanted to display a list of all items in Table A, each followed by a list of all associated items in Table B.

My initial thought was to hand-roll my own JSON, so that the result of the query would have a few columns with the results of Table A, then a column with the JSON-encoded data from Table B. I realize that one should never hand-roll their own JSON, but there isn’t a native JSON encoder in MySQL, so I had to make do.

It was easy enough to come up with a basic JSON format using the following. I’ve done some pretty aggressive tabbing and newlines here to try to make the layers of functions as straightforward as possible; different arguments to each function align vertically.

SELECT a.aId, a.name,
    CONCAT(
        '[',
        GROUP_CONCAT(
            CONCAT(
                '[',
                b.bId, ', ',
                '\"', b.name, '\"',
                ']')
            ORDER BY b.bId ASC
            SEPARATOR ','),
        ']'
    ) AS bData
FROM a
    JOIN b
        ON a.bId = b.bId
GROUP BY a.aId

(Note that I am not the first person to come up with this solution.)

However, if b.name has a quotation mark (or a variety of other characters), this creates invalid JSON. How do we ensure that we always create valid JSON, even when there are special characters or unicode/multi-byte characters?

The solution

Ensuring we always have valid JSON was surprisingly easy – we can hexadecimal-encode b.name in the MySQL query, which ensures that its character set is 0-9 or A-F, so it will always be properly contained within the quotation marks around it. We then convert it back into a normal string in the application code.

SELECT a.aId, a.name,
    CONCAT(
        '[',
        COALESCE(
            GROUP_CONCAT(
                CONCAT(
                    '[',
                    b.bId, ', ',
                    '\"', HEX(b.name), '\"',
                    ']')
                ORDER BY b.bId ASC
                SEPARATOR ','),
            ''),
        ']') AS bData
FROM a
    JOIN b
        ON a.bId = b.bId
GROUP BY a.aId

So let’s say b.name was something like '☃' (because I know people are always naming their things [unicode Snowman]), the resulting hexadecimal representation is 27E2988327. Opaque, but definitely JSON-safe!

Note that we also use COALESCE in case there are no associated items in Table B; otherwise CONCAT('[', NULL, ']' gives us NULL, not [].

All we have left to do is convert the hexadecimal representation back into text in the application. Here’s a function in PHP that does the trick:

function hexToStr($hex)
{
    $string = '';
    for ($charIter = 0; $charIter < strlen($hex) - 1; $charIter += 2)
    {
        $string .= chr(hexdec($hex[$charIter] . $hex[$charIter + 1]));
    }
    return $string;
}

Happy coding, and remember to be safe and always properly escape your strings!

Feel free to ask questions or give feedback via Twitter.