Parsing PHP Doc Comments into a Data Structure

I am surprised this wasn't mentioned yet: what about using Zend_Reflection of Zend Framework? This may come in handy especially if you work with a software built on Zend Framework like Magento.

See the Zend Framework Manual for some code examples and the API Documentation for the available methods.

There are different ways to do this:

  • Pass a file name to Zend_Reflection_File.
  • Pass an object to Zend_Reflection_Class.
  • Pass an object and a method name to Zend_Reflection_Method.
  • If you really only have the comment string at hand, you even could throw together the code for a small dummy class, save it to a temporary file and pass that file to Zend_Reflection_File.

Let's go for the simple case and assume you have an existing class you want to inspect.

The code would be like this (untested, please forgive me):

$method = new Zend_Reflection_Method($class, 'yourMethod');
$docblock = $method->getDocBlock();

if ($docBlock->hasTag('return')) {
    $tagReturn = $docBlock->getTag('return'); // $tagReturn is an instance of Zend_Reflection_Docblock_Tag_Return
    echo "Returns a: " . $tagReturn->getType() . "<br>";
    echo "Comment for return type: " . $tagReturn->getDescription();
}

You can use the "DocBlockParser" class from the Fabien Potencier Sami ("Yet Another PHP API Documentation Generator") open-source project.
First of all, get Sami from GitHub.
This is an example of how to use it:

<?php

require_once 'Sami/Parser/DocBlockParser.php';
require_once 'Sami/Parser/Node/DocBlockNode.php';

class TestClass {
    /**
     * This is the short description.
     *  
     * This is the 1st line of the long description 
     * This is the 2nd line of the long description 
     * This is the 3rd line of the long description   
     *  
     * @param bool|string $foo sometimes a boolean, sometimes a string (or, could have just used "mixed")
     * @param bool|int $bar sometimes a boolean, sometimes an int (again, could have just used "mixed") 
     * @return string de-html_entitied string (no entities at all)
     */
    public function another_test($foo, $bar) {
        return strtr($foo,array_flip(get_html_translation_table(HTML_ENTITIES)));
    }
}

use Sami\Parser\DocBlockParser;
use Sami\Parser\Node\DocBlockNode;

try {
    $method = new ReflectionMethod('TestClass', 'another_test');
    $comment = $method->getDocComment();
    if ($comment !== FALSE) {
        $dbp = new DocBlockParser();
        $doc = $dbp->parse($comment);
        echo "\n** getDesc:\n";
        print_r($doc->getDesc());
        echo "\n** getTags:\n";
        print_r($doc->getTags());
        echo "\n** getTag('param'):\n";
        print_r($doc->getTag('param'));
        echo "\n** getErrors:\n";
        print_r($doc->getErrors());
        echo "\n** getOtherTags:\n";
        print_r($doc->getOtherTags());
        echo "\n** getShortDesc:\n";
        print_r($doc->getShortDesc());
        echo "\n** getLongDesc:\n";
        print_r($doc->getLongDesc());
    }
} catch (Exception $e) {
    print_r($e);
}

?>

And here is the output of the test page:

** getDesc:
This is the short description.

This is the 1st line of the long description 
This is the 2nd line of the long description 
This is the 3rd line of the long description
** getTags:
Array
(
    [param] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => Array
                                (
                                    [0] => bool
                                    [1] => 
                                )

                            [1] => Array
                                (
                                    [0] => string
                                    [1] => 
                                )

                        )

                    [1] => foo
                    [2] => sometimes a boolean, sometimes a string (or, could have just used "mixed")
                )

            [1] => Array
                (
                    [0] => Array
                        (
                            [0] => Array
                                (
                                    [0] => bool
                                    [1] => 
                                )

                            [1] => Array
                                (
                                    [0] => int
                                    [1] => 
                                )

                        )

                    [1] => bar
                    [2] => sometimes a boolean, sometimes an int (again, could have just used "mixed")
                )

        )

    [return] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => Array
                                (
                                    [0] => string
                                    [1] => 
                                )

                        )

                    [1] => de-html_entitied string (no entities at all)
                )

        )

)

** getTag('param'):
Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => bool
                            [1] => 
                        )

                    [1] => Array
                        (
                            [0] => string
                            [1] => 
                        )

                )

            [1] => foo
            [2] => sometimes a boolean, sometimes a string (or, could have just used "mixed")
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => bool
                            [1] => 
                        )

                    [1] => Array
                        (
                            [0] => int
                            [1] => 
                        )

                )

            [1] => bar
            [2] => sometimes a boolean, sometimes an int (again, could have just used "mixed")
        )

)

** getErrors:
Array
(
)

** getOtherTags:
Array
(
)

** getShortDesc:
This is the short description.
** getLongDesc:
This is the 1st line of the long description 
This is the 2nd line of the long description 
This is the 3rd line of the long description