mahout安装
数据统计分析

mahout安装
主机名 | IP | 服务 |
---|---|---|
ubuntu02 | 192.168.0.152 | zookeeper、namenode、resourcemanager、jobhistoryserver、hregionserver |
ubuntu03 | 192.168.0.153 | zookeeper、datanode、nodemanager、hregionserver |
ubuntu04 | 192.168.0.154 | zookeeper、datanode、nodemanager、hregionserver |
ubuntu05 | 192.168.0.155 | secondarynamenode、datanode、nodemanager、hmaster、mahout |
apache-mahout-distribution-0.13.0.tar.gz
安装
wget -c \
https://www-us.apache.org/dist/mahout/0.13.0/apache-mahout-distribution-0.13.0.tar.gz
tar -zxvf apache-mahout-distribution-0.13.0.tar.gz && \
mv apache-mahout-distribution-0.13.0/ /usr/local/
cd /usr/local && ln -s apache-mahout-distribution-0.13.0/ mahout
cat >> /etc/profile << EOF
# for mahout
export MAHOUT_HOME=/usr/local/mahout
export MAHOUT_CONF_DIR=\$MAHOUT_HOME/conf
export PATH=\$MAHOUT_HOME/bin:\$PATH
EOF
source /etc/profile
mahout
mahout测试
wget -c \
http://archive.ics.uci.edu/ml/\
databases/synthetic_control/synthetic_control.data
hdfs dfs -mkdir -p /user/root/testdata
hdfs dfs -put synthetic_control.data /user/root/testdata
mahout -core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
hdfs dfs -ls /user/root/output
hdfs dfs -ls /user/root/output/clusteredPoints
mahout seqdumper -i /user/root/output/clusteredPoints -o clusteredPoints
mahout vectordump -i /user/root/output/data -o raw_data
mahout算法
# 分类算法
Logistic Regression # 逻辑回归
Bayesian # 贝叶斯
SVM # 支持向量机
Perceptron # 感知器算法
Neural Network # 神经网络
Random Forests # 随机森林
Restricted Boltzmann Machines # 有限波尔兹曼机
# 聚类算法
Canopy Clustering # Canopy聚类
K-means Clustering # K均值算法
Fuzzy K-means # 模糊K均值
Expectation Maximization # EM聚类(期望最大化聚类)
Mean Shift Clustering # 均值漂移聚类
Hierarchical Clustering # 层次聚类
Dirichlet Process Clustering # 狄里克雷过程聚类
Latent Dirichlet Allocation # LDA聚类
Spectral Clustering # 谱聚类
# 关联规则挖掘
Parallel FP Growth Algorithm # 并行FP Growth算法
# 降维/维约简
Singular Value Decomposition # 奇异值分解
Principal Components Analysis # 主成分分析
Independent Component Analysis # 独立成分分析
Gaussian Discriminative Analysis# 高斯判别分析
# 推荐/协同过滤
Non-distributed recommenders # Taste(UserCF, ItemCF, SlopeOne)
Distributed Recommenders # ItemCF
# 向量相似度计算
RowSimilarityJob # 计算列间相似度
VectorDistanceJob # 计算向量间距离
# 非Map-Reduce算法
Hidden Markov Models # 隐马尔科夫模型
网友评论