mahout决策树之Partial Implementation源码分析 part4

今天来说，应该是把所有Partial Implementation的内容分析完了（当然也只是分析了属性是离散值的情况的数据，而非离散的并没有分析），下面就说下Partial Implementation实战的第三部分：TestForest,这个源文件在$MAHOUT_HOME/example/src/main/java/org/apache/mahout/classifier/df/mapreduce里面，打开源码可以看到：

TestForest里面的主要操作如下：

1.设置参数，直接调用 testForest()方法，在testForest()方法中首先确认(1)输出路径不存在;(2)确认forestt存在；(3)确认data存在；

// make sure the output file does not exist
    if (outputPath != null) {
      outFS = outputPath.getFileSystem(getConf());
      if (outFS.exists(outputPath)) {
        throw new IllegalArgumentException("Output path already exists");
      }
    }
    // make sure the decision forest exists
    FileSystem mfs = modelPath.getFileSystem(getConf());
    if (!mfs.exists(modelPath)) {
      throw new IllegalArgumentException("The forest path does not exist");
    }
    // make sure the test data exists
    dataFS = dataPath.getFileSystem(getConf());
    if (!dataFS.exists(dataPath)) {
      throw new IllegalArgumentException("The Test data path does not exist");
    }

2.一般都是使用mapreduce方法，而非单机，所以直接调用mapreduce()方法，mapreduce()方法如下

定义了Classifier 类，然后直接执行其run()方法：

Classifier classifier = new Classifier(modelPath, dataPath, datasetPath, outputPath, getConf());

classifier.run();

3. 打开Classifier源文件，可以看到这个类定义了一个Job，其Mapper为CMapper（属于Classifier的静态内部类），Cmapper的setup() 方法主要是进行一些参数的设定工作，map()方法如下：

3.1 首先获得数据转换，即把Text的输入转为Instance：

Instance instance = converter.convert(line);

3.2 直接调用forest.classify()方法进行对instance的分类：

double prediction = forest.classify(dataset, rng, instance);

3.2.1forest的classify方法是去遍历我们得到的全部的tree，然后每棵树都会有一个预测的分类结果，把这些全部加起来，取次数最多的分类结果

 int[] predictions = new int[dataset.nblabels()];
      for (Node tree : trees) {
        double prediction = tree.classify(instance);
        if (prediction != -1) {
          predictions[(int) prediction]++;
        }
      }
  
 return DataUtils.maxindex(rng, predictions);

3.2.2每一个分类结果都会查询到叶子节点，叶子节点即Leaf直接返回该节点的label值：

public double classify(Instance instance) {
    return label;
  }

3.3 把原始已知的分类结果设置为key，把预测的分类结果设置为value进行输出：

lkey.set(dataset.getLabel(instance));
lvalue.set(Double.toString(prediction));
context.write(lkey, lvalue);

4. 接着Classifier把mapper的输出结果复制到输出路径，并删除Mapper的输出结果；

parseOutput(job);

HadoopUtil.delete(conf, mappersOutputPath);

分享，快乐，成长

转载请注明出处：http://blog.csdn.net/fansy1990

作者：fansy1990 发表于2013-1-26 20:44:01 原文链接

阅读：0 评论：0 查看评论

mahout决策树之Partial Implementation源码分析 part4

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本