Efficient algorithms for computing the best subset regression models for large-scale problems
Several strategies for computing the best subset regression models are proposed. Some of the algorithms are modified versions of existing regression-tree methods, while others appear for the first time. The first algorithm selects the best subset models within a given size range. It uses a reduced search space and is found to computationally outperform the existing branch-and-bound algorithm. The properties and computational aspects of the proposed algorithm are discussed in detail. The second new algorithm preorders the variables inside the regression tree. A radius is defined in order to measure the distance of a node from the root of the tree. The algorithm applies the
preordering to all nodes which have a smaller distance than a a priori given radius. An efficient approach to preorder the variables is employed. The experimental results indicate that the algorithm performs best when preordering is employed on a radius lying in between one quarter and one third of the number of variables. The algorithm has been applied with such a radius to tackle large scale subset selection problems that are considered computationally infeasible by conventional exhaustive selection methods. A class of new heuristic strategies are also proposed. The most important is the one that assigns a different tolerance value to each subset model size. This strategy covers all exhaustive and heuristic subset selection strategies. In addition it can be used to investigate submodels having noncontiguous size ranges. The implementation of this strategy provides a flexible tool for tackling large scale models.
No comments:
Post a Comment