SPO600 Project Stage 3 - Identify clones of multiple functions and compare them

 In the previous parts, I created a GCC pass to identify function clones and compare them. 

Now, I would like to modify the pass so that it could handle program that has multiple functions which are cloned. Moreover, I personally want to organize the static variables. Instead of using parallel vectors/multiples map to store the characteristics of the clone variant1, I would like to use a struct to store that of clone of variant 1 of every cloned function.

Step 1. Revise the code to handle program with multiple functions cloned in x86 server

Here is the revised pass. The pass and critical files are also uploaded to GitHub.

/* Test Pass
        Jeff Yau, Seneca Polytechnic College
        Student ID :142466234
        Modelled on tree-nrv.cc and tree-ctyler.cc by Chris Tyler, Seneca Polytechnic College, 2024-11

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.  */

#include <map>
#include <vector>
#include <string>
#include <stdlib.h>
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"
#include "gcc-plugin.h"

namespace {
const pass_data pass_data_jeff =
{
  GIMPLE_PASS, /* type */
  "jeff", /* name */
  OPTGROUP_NONE, /* optinfo_flags */
  TV_NONE, /* tv_id */
  PROP_cfg, /* properties_required */
  0, /* properties_provided */
  0, /* properties_destroyed */
  0, /* todo_flags_start */
  0, /* todo_flags_finish */
};

struct gimple_data{
    int m_bb_count = 0;
    int m_stmt_count = 0;
    std::vector<enum gimple_code> m_gimple_code;
    std::vector<unsigned> m_operands_count;
    std::vector<enum tree_code> m_operandsType;
    std::map<int,int> m_variableMap;
};

class pass_jeff : public gimple_opt_pass
{
public:
  pass_jeff (gcc::context *ctxt)
    : gimple_opt_pass (pass_data_jeff, ctxt)
  {}

  /* opt_pass methods: */
  bool gate (function *) final override { return 1; }

  unsigned int execute (function *) final override;

}; // class pass_jeff


int process_gimple_variables(std::map<tree, int>& refMap, int index, gimple* stmt, std::map<int, int>& resultMap) {
    std::vector<tree> vars;

    switch (gimple_code(stmt)) {
        case GIMPLE_ASSIGN: {
            tree lhs = gimple_assign_lhs(stmt);
            vars.push_back(lhs);

            int rhs_ops = gimple_num_ops(stmt) - 1;
            for (int i = 1; i <= rhs_ops; ++i) {
                tree op = gimple_op(stmt, i);
                vars.push_back(op);
            }
            break;
        }

        case GIMPLE_DEBUG_BIND: {
            tree var = gimple_debug_bind_get_var(stmt);
            tree val = gimple_debug_bind_get_value(stmt);
            if (var)
                vars.push_back(var);
            if (val)
                vars.push_back(val);
            break;
        }

        case GIMPLE_COND: {
            tree lhs = gimple_cond_lhs(stmt);
            tree rhs = gimple_cond_rhs(stmt);
            if (lhs)
                vars.push_back(lhs);
            if (rhs)
                vars.push_back(rhs);
            break;
        }
        default:
            break;
    }

    for (tree var : vars) {
        std::map<tree, int>::iterator it = refMap.find(var);
        if (it != refMap.end()) {
            resultMap[index] = it->second;
        } else {
            refMap[var] = index;
            resultMap[index] = index;
        }
        ++index;
    }

    return index;
}

unsigned int pass_jeff::execute(function *func)
{
        static bool first = true;
        static std::map<std::string, gimple_data> function_data_map;
        struct cgraph_node *node;
        std::string base_function;
        if (!dump_file ||!func || !func ->cfg)
                return 0;
        if (first){
                FOR_EACH_DEFINED_FUNCTION(node)
                {
                        const char *fname = node->name();
                        if (!fname)
                            continue;   //ignore function with no name

                        std::string name(fname);

                        if ( name.find(".resolver") != std::string::npos)
                        {
                            std::string base_function_name;
                            base_function_name = name.substr(0, name.find(".resolver"));
                            function_data_map[base_function_name]= gimple_data();
                            fprintf(dump_file, "Found resolver of function: %s\n", base_function_name.c_str());
                        }
                }
                first = false;
        }
        node = cgraph_node::get(func->decl);
        if (!node){
            fprintf(dump_file, "ERROR: cgraph_node::get(func->decl)\n");
            return 0;
        }
        const char *fname = node->name();
        std::string name(fname);
        if ( name.find(".resolver") !=std::string::npos)
            return 0;
        base_function = name.substr(0, name.find("."));
        if (function_data_map.find(base_function) == function_data_map.end()){
            return 0; //this function is not cloned.
        }
        gimple_data &reference = function_data_map[base_function];

        gimple_data current;
        std::map<tree, int> refMap;
        basic_block bb;
        int index = 0;

        FOR_EACH_BB_FN(bb, func)
        {
                current.m_bb_count++;
                for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){
                        current.m_stmt_count++;
                }
        }

        if (reference.m_bb_count == 0 && reference.m_stmt_count == 0){
                fprintf(dump_file,"recording-------------\n");
                FOR_EACH_BB_FN(bb,func){
                        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){
                            gimple *stmt = gsi_stmt(gsi);
                            index = process_gimple_variables(refMap, index, stmt, current.m_variableMap);
                                current.m_gimple_code.push_back(gimple_code(stmt));
                                current.m_operands_count.push_back(gimple_num_ops(stmt));
                                if (is_gimple_assign(stmt))
                                        current.m_operandsType.push_back(gimple_assign_rhs_code(stmt));
                        }
                }
                function_data_map[base_function] = current;
        }
        else
        {
                fprintf(dump_file, "Comparing-------------------\n");
                if (current.m_bb_count != reference.m_bb_count || current.m_stmt_count != reference.m_stmt_count){
                        fprintf(dump_file, " NOPRUNE: %s\n",name.c_str());
                        return 0;
                }
                std::vector<enum gimple_code>::iterator it = reference.m_gimple_code.begin();
                std::vector<unsigned>::iterator count_it = reference.m_operands_count.begin();
                std::vector<enum tree_code>::iterator optype_it = reference.m_operandsType.begin();
                std::map<int,int> var2_variableMap;
                index = 0;
                FOR_EACH_BB_FN(bb, func){
                        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){
                                gimple *stmt = gsi_stmt(gsi);
                                index = process_gimple_variables(refMap, index, stmt, var2_variableMap);
                                if (gimple_code(stmt) != *it
                                        ||gimple_num_ops(stmt) != *count_it)
                                {
                                        fprintf(dump_file, " NOPRUNE: %s\n", name.c_str());
                                        return 0;
                                }
                                if( *it == GIMPLE_ASSIGN && *(optype_it++) != gimple_assign_rhs_code(stmt))
                                {
                                        fprintf(dump_file, " NOPRUNE: %s\n",name.c_str());
                                        return 0;
                                }
                                ++it;
                                ++count_it;
                        }
                }
                std::map<int, int>::const_iterator it1 = reference.m_variableMap.begin();
                std::map<int, int>::const_iterator it2 = var2_variableMap.begin();

                while (it1 != reference.m_variableMap.end()) {
                        if (it1->first != it2->first || it1->second != it2->second)
                        {
                                fprintf(dump_file, " NOPRUNE: %s\n",name.c_str());
                                return 0;
                        }
                        ++it1;
                        ++it2;
                }
           fprintf(dump_file, " PRUNE: %s\n",name.c_str());
        }

    return 0;
}


} // anon namespace

gimple_opt_pass *
make_pass_jeff (gcc::context *ctxt)
{
  return new pass_jeff (ctxt);
}

Let's test it with the previous test cases which has only one function being cloned.


The codeworks perfectly. It shows NOPRUNE/ PRUNE correctly.

Step 2. Add a PRUNE function to the test case (in x86 server)

Now, I added a PRUNE function "print_median" to the test case "clone-test-core.c". As a result, that program will two cloned functions. One is the newly added one, which is always prune. The other one is the previous one, which can be prune / no-prune in different output file. Revised "clone-test-core.c" is uploaded to GitHub.

PRUNE function added:
CLONE_ATTRIBUTE
void print_median(int16_t *buff, size_t samples) {
    printf("median: %d\n", buff[samples / 2]);
}
The test output (1 no-prune + 1 prune):

It works perfectly. The original no-prune function marked as NOPRUNE and the new prune function is marked as PRUNE



I also looked at the program of two prune functions. It also worked perfectly. Both functions are marked as PRUNE. Display as below:



Step 3. Duplicate the test setup on AArch64 server.

I copied the revised pass and test case program "clone-tree-core-c" to the AArch64 server and duplicated the test. It works perfectly.

The test output (1 no-prune + 1 prune):





Good. The original no-prune function marked as NOPRUNE and the new prune function is marked as PRUNE


The test output (2 prune):




Good. Both functions are marked as PRUNE.


Reflection

Pass limitation and capabilities :

First, my pass only works with the GIMPLE intermediate representation and only analyzes C programs. It does not modify or optimize the program in any way. It processes every function regardless of whether it is ever called. The purpose of the pass is to determine whether a clone can be pruned, so it uses dump_file to output the results and only operates on functions that have a body.

The pass can now handle multiple cloned functions, but each clone is only compared with the first detected clone of the same name. The pass is able to compare two clones based on the total number of basic blocks and Gimple statements. It then compares the types of Gimple statements. If a statement is an assignment, it compares the operands on the right-hand side.

Regarding variable comparison, the pass only processes Gimple statements of types: GIMPLE_ASSIGN, GIMPLE_COND, and GIMPLE_DEBUG. It may not behave correctly if other types of GIMPLE statements appear. 

Finally, it compares the appearance pattern of variables in GIMPLE statements of the following types of GIMPLE_ASSIGN, GIMPLE_COND, and GIMPLE_DEBUG_BIND.

If a clone is identical to the first clone detected with same function name, it will be marked as PRUNE. Otherwise, it will be marked as NOPRUNE. 

My test cases in both x86 and AArch64 server includes:-
1. only 1 PRUNE function
2. only 1 NOPRUNE function
3. 1 PRUNE function + 1 NOPRUNE function
4. 2 PRUNE function 

What I have experienced and learn particular in stage 3:

The code I wrote in Stage 2 was not very organized, as I was adding small pieces of code to test and learn how to compare two clones. I really wanted to re-write it, but I ran out of time since it was already late in the semester. Fortunately, Project Stage 3 gave me a chance to organize my code. I grouped all the characteristics of a clone into a struct, so they can now be compared simply by accessing that struct.

Also, this time, I took the opportunity to study how to create test cases by cloning functions, which I didn’t have enough time to explore in detail during the earlier stages. This project has not been the easiest, due to the poor documentation of GCC passes, long compile times, and the complicated process of testing the passes, along with the mixed C/C++ code in the pass. Through this experience, I’ve come to understand the challenges faced by GCC developers and deeply appreciate their work in optimizing the compilation of GCC programs.


Comments

Popular posts from this blog

SPO600 Project Stage 1 (Pt.1) - Create a GCC Pass

SPO600 Project Stage 2 (Pt.1) - GCC pass locating clone function

SPO600 Project Stage 2 (Pt.2) - GCC pass locating clone function -modified version