parsertl: The Modular Parser Generator

Introduction

parsertl is a modern, modular parser generator. Currently it supports LALR(1), but eventually it may support IELR and LALR(k) in the future.

Design Goals

C++11/14

Visual C++ still doesn't fully support C++11/14 and so I have deliberatly stuck with C++03 for this library. Times have certainly changed from 2004 when I first started on the codebase that became lexertl using Visual Studio 6 and so I will keep an eye on the progress of developers dumping C++03 for good and will redo both libraries when there seems to be little to no interest in the old standard.

Examples

Build a simple parser

#include "parsertl/generator.hpp"
#include "parsertl/parser.hpp"

int main(int argc, char *argv[])
{
    parsertl::rules rules_;
    parsertl::parser<lexertl::citerator> parser_;

    try
    {
        // Calculator from flex & bison book.
        rules_.token("INTEGER");
        rules_.left("'+' '-'");
        rules_.left("'*' '/'");
        rules_.precedence("UMINUS");

        rules_.push("start", "exp");
        rules_.push("exp", "exp '+' exp");
        rules_.push("exp", "exp '-' exp");
        rules_.push("exp", "exp '*' exp");
        rules_.push("exp", "exp '/' exp");
        rules_.push("exp", "'(' exp ')'");
        rules_.push("exp", "'-' exp %prec UMINUS");
        rules_.push("exp", "INTEGER");
        parsertl::generator::build(rules_, parser_.sm);
    }
    catch (const std::exception &e)
    {
        std::cout << e.what() << '\n';
    }

    return 0;
}

Check input for validity

#include "parsertl/generator.hpp"
#include "parsertl/parser.hpp"

int main(int argc, char *argv[])
{
    parsertl::rules grules_;
    parsertl::parser<lexertl::citerator> parser_;
    lexertl::rules lrules_;
    lexertl::state_machine lsm_;

    try
    {
        // Calculator from flex & bison book.
        grules_.token("INTEGER");
        grules_.left("'+' '-'");
        grules_.left("'*' '/'");
        grules_.precedence("UMINUS");

        grules_.push("start", "exp");
        grules_.push("exp", "exp '+' exp");
        grules_.push("exp", "exp '-' exp");
        grules_.push("exp", "exp '*' exp");
        grules_.push("exp", "exp '/' exp");
        grules_.push("exp", "'(' exp ')'");
        grules_.push("exp", "'-' exp %prec UMINUS");
        grules_.push("exp", "INTEGER");
        parsertl::generator::build(grules_, parser_.sm);

        lrules_.push("[+]", grules_.token_id("'+'"));
        lrules_.push("-", grules_.token_id("'-'"));
        lrules_.push("[*]", grules_.token_id("'*'"));
        lrules_.push("[/]", grules_.token_id("'/'"));
        lrules_.push("\\d+", grules_.token_id("INTEGER"));
        lrules_.push("[(]", grules_.token_id("'('"));
        lrules_.push("[)]", grules_.token_id("')'"));
        lrules_.push("\\s+", lsm_.skip());
        lexertl::generator::build(lrules_, lsm_);

        std::string expr_("1 + 2 * -3");
        lexertl::citerator iter_(expr_.c_str(), expr_.c_str() + expr_.size(), lsm_);

        parser_.init(iter_);

        bool success_ = parser_.parse(iter_);
    }
    catch (const std::exception &e)
    {
        std::cout << e.what() << '\n';
    }

    return 0;
}

Add semantic actions

#include "parsertl/generator.hpp"
#include <iostream>
#include "lexertl/iterator.hpp"
#include "parsertl/parser.hpp"

int main()
{
    parsertl::rules grules_;
    typedef parsertl::parser<lexertl::citerator> parser;
    parser parser_;
    lexertl::rules lrules_;
    lexertl::state_machine lsm_;

    try
    {
        grules_.token("INTEGER");
        grules_.left("'+' '-'");
        grules_.left("'*' '/'");
        grules_.precedence("UMINUS");

        grules_.push("start", "exp");

        const std::size_t add_index_ = grules_.push("exp", "exp '+' exp");
        const std::size_t sub_index_ = grules_.push("exp", "exp '-' exp");
        const std::size_t mul_index_ = grules_.push("exp", "exp '*' exp");
        const std::size_t div_index_ = grules_.push("exp", "exp '/' exp");

        grules_.push("exp", "'(' exp ')'");

        const std::size_t umin_index_ = grules_.push("exp", "'-' exp %prec UMINUS");
        const std::size_t int_index_ = grules_.push("exp", "INTEGER");

        parsertl::generator::build(grules_, parser_.sm);

        lrules_.push("[+]", grules_.token_id("'+'"));
        lrules_.push("-", grules_.token_id("'-'"));
        lrules_.push("[*]", grules_.token_id("'*'"));
        lrules_.push("[/]", grules_.token_id("'/'"));
        lrules_.push("\\d+", grules_.token_id("INTEGER"));
        lrules_.push("[(]", grules_.token_id("'('"));
        lrules_.push("[)]", grules_.token_id("')'"));
        lrules_.push("\\s+", lrules_.skip());
        lexertl::generator::build(lrules_, lsm_);

        std::string expr_("1 + 2 * -3");
        lexertl::citerator iter_(expr_.c_str(), expr_.c_str() + expr_.size(), lsm_);
        parser::token_vector productions_;
        std::stack<int> stack_;

        parser_.init(iter_);

        while (parser_.entry._action != parsertl::error &&
            parser_.entry._action != parsertl::accept)
        {
            switch (parser_.entry._action)
            {
                case parsertl::error:
                    throw std::runtime_error("Parser error");
                    break;
                case parsertl::shift:
                case parsertl::go_to:
                case parsertl::accept:
                    break;
                case parsertl::reduce:
                {
                    std::size_t rule_ = parser_.reduce_id();
                
                    if (rule_ == add_index_)
                    {
                        const int rhs_ = stack_.top();

                        stack_.pop();
                        stack_.top() = stack_.top() + rhs_;
                    }
                    else if (rule_ == sub_index_)
                    {
                        const int rhs_ = stack_.top();

                        stack_.pop();
                        stack_.top() = stack_.top() - rhs_;
                    }
                    else if (rule_ == mul_index_)
                    {
                        const int rhs_ = stack_.top();

                        stack_.pop();
                        stack_.top() = stack_.top() * rhs_;
                    }
                    else if (rule_ == div_index_)
                    {
                        const int rhs_ = stack_.top();

                        stack_.pop();
                        stack_.top() = stack_.top() / rhs_;
                    }
                    else if (rule_ == umin_index_)
                    {
                        stack_.top() *= -1;
                    }
                    else if (rule_ == int_index_)
                    {
                        stack_.push(atoi(parser_.dollar(0, productions_).start));
                    }

                    break;
                }
            }

            parser_.next(iter_, productions_);
        }

        std::cout << "Result: " << stack_.top() << '\n';
    }
    catch (const std::exception &e)
    {
        std::cout << e.what() << '\n';
    }

    return 0;
}

Trace parser operation

#include "parsertl/generator.hpp"
#include <iostream>
#include "parsertl/parser.hpp"

template<typename iterator>
class shift_functor
{
public:
    shift_functor(const std::size_t integer_, std::stack<int> &values_) :
        _integer(integer_),
        _values(values_)
    {
    }

    void operator()(const iterator &iter_)
    {
        if (iter_->id == _integer)
        {
            _values.push(atoi(iter_->start));
        }
    }

private:
    const std::size_t _integer;
    std::stack<int> &_values;
};

class reduce_functor
{
public:
    std::size_t _add;
    std::size_t _subtract;
    std::size_t _multiply;
    std::size_t _divide;
    std::size_t _uminus;

    reduce_functor(std::stack<int> &values_) :
        _add(~0),
        _subtract(~0),
        _multiply(~0),
        _divide(~0),
        _uminus(~0),
        _values(values_)
    {
    }

    void operator()(const std::size_t rule_)
    {
        if (rule_ == _add)
        {
            int rhs_ = 0;

            rhs_ = _values.top();
            _values.pop();
            _values.top() += rhs_;
        }
        else if (rule_ == _subtract)
        {
            int rhs_ = 0;

            rhs_ = _values.top();
            _values.pop();
            _values.top() -= rhs_;
        }
        else if (rule_ == _multiply)
        {
            int rhs_ = 0;

            rhs_ = _values.top();
            _values.pop();
            _values.top() *= rhs_;
        }
        else if (rule_ == _divide)
        {
            int rhs_ = 0;

            rhs_ = _values.top();
            _values.pop();
            _values.top() /= rhs_;
        }
        else if (rule_ == _uminus)
        {
            _values.top() *= -1;
        }
    }

private:
    std::stack<int> &_values;
};

int main(int argc, char *argv[])
{
    parsertl::rules grules_;
    parsertl::parser<lexertl::citerator> parser_;
    parsertl::rules::string_vector symbols_;
    lexertl::rules lrules_;
    lexertl::state_machine lsm_;
    std::stack<int> values_;

    try
    {
        reduce_functor reduce_(values_);

        // Calculator from flex & bison book.
        grules_.token("INTEGER");
        grules_.left("'+' '-'");
        grules_.left("'*' '/'");
        grules_.precedence("UMINUS");

        grules_.push("start", "exp");
        reduce_._add = grules_.push("exp", "exp '+' exp");
        reduce_._subtract = grules_.push("exp", "exp '-' exp");
        reduce_._multiply = grules_.push("exp", "exp '*' exp");
        reduce_._divide = grules_.push("exp", "exp '/' exp");
        grules_.push("exp", "'(' exp ')'");
        reduce_._uminus = grules_.push("exp", "'-' exp %prec UMINUS");
        grules_.push("exp", "INTEGER");

        shift_functor<lexertl::citerator> shift_(grules_.token_id("INTEGER"), values_);

        parsertl::generator::build(grules_, parser_.sm);
        grules_.terminals(symbols_);
        grules_.non_terminals(symbols_);

        lrules_.push("[+]", grules_.token_id("'+'"));
        lrules_.push("-", grules_.token_id("'-'"));
        lrules_.push("[*]", grules_.token_id("'*'"));
        lrules_.push("[/]", grules_.token_id("'/'"));
        lrules_.push("\\d+", grules_.token_id("INTEGER"));
        lrules_.push("[(]", grules_.token_id("'('"));
        lrules_.push("[)]", grules_.token_id("')'"));
        lrules_.push("\\s+", lsm_.skip());
        lexertl::generator::build(lrules_, lsm_);

        std::string expr_("1 + 2 * -3");
        lexertl::citerator iter_(expr_.c_str(), expr_.c_str() + expr_.size(), lsm_);

        parser_.init(iter_);

        while (parser_.entry._action != parsertl::error &&
            parser_.entry._action != parsertl::accept)
        {
            // Debug info
            switch (parser_.entry._action)
            {
                case parsertl::error:
                    throw std::runtime_error("Parser error");
                    break;
                case parsertl::shift:
                    shift_(iter_);
                    std::cout << 's' << parser_.entry._param << '\n';
                    break;
                case parsertl::reduce:
                {
                    parsertl::state_machine::size_t_size_t_pair &pair_ =
                        parser_.sm._rules[parser_.entry._param];

                    reduce_(parser_.reduce_id());
                    std::cout << "reduce by " << symbols_[pair_.first] << " ->";

                    if (pair_.second.empty())
                    {
                        std::cout << " %empty";
                    }
                    else
                    {
                        for (auto iter_ = pair_.second.cbegin(),
                            end_ = pair_.second.cend(); iter_ != end_; ++iter_)
                        {
                            std::cout << ' ' << symbols_[*iter_];
                        }
                    }

                    std::cout << '\n';
                    break;
                }
                case parsertl::go_to:
                    std::cout << "goto " << parser_.entry._param << '\n';
                    break;
                case parsertl::accept:
                    std::cout << "accept\n";
                    break;
            }

            parser_.next(iter_);
        }

        if (!values_.empty())
            std::cout << '\n' << expr_ << " = " << values_.top() << '\n';
    }
    catch (const std::exception &e)
    {
        std::cout << e.what() << '\n';
    }

    return 0;
}

The supplied parser struct is small, simple and clean. I'm hoping it is flexible enough to meet most needs, but should you need to write your own lookup instead, it serves as a good tutorial.

This is very much still a work in progress, so please be sure to leave a comment below.

References

Comments for parsertl

Ben Hanson, 2017-01-28 16:20

Yes, once generated the parsertl state_machine should be treated as read only. It would therefore would be no problem to share a common sm amongst multiple custom parser instances.

I would be interested to hear how you get on and how you find parsertl to use in general!

Sudhakar B, 2017-01-14 14:36

We currently use the parsertl library to parse some of our domain specific scripts. We create multiple instances of the parsertl::parser struct to parse the scripts which have the same parser/lexer rules (grammar). I assume the member variable parser::sm (state_machine) would be the same in the all parser structs that use the same parser/lexer rules and is used read-only during parsing after building it using the function call parsertl::generator::build(parserRules, parser.sm).

Can I share the instance of state_machine across the multiple parser objects with the same parser/lexer rules? I plan to implement a new parser class based on parsertl::parser to share the state_machine object. Please let me know your thoughts.

Ben Hanson, 2015-07-18 16:56

Hi Sudhakar B

There is no built in $$ function yet. I'm thinking that the caller can maintain that state externally as required, although I'm unclear if this is the right way to go. So far I've done interfaces for lookup where you only pay for what you use.

Sudhakar B, 2015-07-18 02:29

Hi Ben

Thanks for the great work. I just started using parsertl & lexertl. Is there any equivalent dollar function to get the pseudo variable $$?

Thanks.

Submit a comment

Name:

Comment: