Follow instaructions at: http://docs.ceph.com/docs/luminous/rados/deployment/
Its important to update ceph-deploy:
pip install --user ceph-deploy
sudo pip install -U ceph-deploy
New release should be specified with ceph-deploy
ceph-deploy new --release luminous {host [host], ...}
In my case keys needed to be re-create after deployment tasks:
sudo ceph-create-keys --verbose --id node1
ceph-deploy gatherkeys node1 node2 node3
sudo ceph osd pool create hadoop1 100
radosgw-admin user create --uid=michal --display-name="Michal Z" [email protected]
radosgw-admin user info --uid=michal
Output will contain secrets.
Use REST API, documentation on Ceph website
export SPARK_HOME=/store/spark-2.3.1-bin-hadoop2.7-nohive/
export HADOOP_HOME=/store/hadoop-2.7.1/
export HIVE_HOME=/store/apache-hive-3.0.0-bin/
Recreate HDFS stuff:
sudo $HADOOP_HOME/bin/hdfs namenode -format
hdfs namenode -format
sudo $HADOOP_HOME/bin/hadoop fs -rm -R /user
sudo $HADOOP_HOME/bin/hadoop fs -rm -R /tmp
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /user/
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
# this is not needed as it restict rights
sudo $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
sudo $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -mkdir /user/
hadoop fs -mkdir /user/hive/
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -mkdir /tmp
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
sudo $HADOOP_HOME/bin/hadoop fs -ls /user/hive/warehouse/
Starting Hive:
sudo $HIVE_HOME/bin/schematool -dbType derby -initSchema
sudo $HIVE_HOME/bin/hiveserver2
sudo $HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000
Hive needs to be buit from master
branch as it supports Spark 2.3.1
- edit Hive pom.xml and set spark.version to 2.3.1
- after its built copy
hive/spark-client/target/hive-spark-client-4.0.0-SNAPSHOT.jar
to distributionlib
folder
Prepare Spark 2.3.1 distribution, it must not contain Hive and Hadoop:
./dev/make-distribution.sh --name 'hadoop27-nohive-k8s' --tgz '-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided,kubernetes'
ln -s $SPARK_HOME/jars/spark-network-common_2.11-2.3.1.jar $HIVE_HOME/lib/spark-network-common_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/spark-core_2.11-2.3.1.jar $HIVE_HOME/lib/spark-core_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/scala-library-2.11.8.jar $HIVE_HOME/lib/scala-library-2.11.8.jar
ln -s $SPARK_HOME/jars/spark-launcher_2.11-2.3.1.jar $HIVE_HOME/lib/spark-launcher_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/chill-java-0.8.4.jar $HIVE_HOME/lib/chill-java-0.8.4.jar
ln -s $SPARK_HOME/jars/jersey-server-2.22.2.jar $HIVE_HOME/lib/jersey-server-2.22.2.jar
ln -s $SPARK_HOME/jars/spark-network-shuffle_2.11-2.3.1.jar $HIVE_HOME/lib/spark-network-shuffle_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/jackson-module-scala_2.11-2.6.7.1.jar $HIVE_HOME/lib/jackson-module-scala_2.11-2.6.7.1.jar
ln -s $SPARK_HOME/jars/jackson-module-paranamer-2.7.9.jar $HIVE_HOME/lib/jackson-module-paranamer-2.7.9.jar
ln -s $SPARK_HOME/jars/jackson-annotations-2.6.7.jar $HIVE_HOME/lib/jackson-annotations-2.6.7.jar
ln -s $SPARK_HOME/jars/jackson-databind-2.6.7.1.jar $HIVE_HOME/lib/jackson-databind-2.6.7.1.jar
ln -s $SPARK_HOME/jars/jersey-container-servlet-core-2.22.2.jar $HIVE_HOME/lib/jersey-container-servlet-core-2.22.2.jar
ln -s $SPARK_HOME/jars/json4s-ast_2.11-3.2.11.jar $HIVE_HOME/lib/json4s-ast_2.11-3.2.11.jar
ln -s $SPARK_HOME/jars/kryo-shaded-3.0.3.jar $HIVE_HOME/lib/kryo-shaded-3.0.3.jar
ln -s $SPARK_HOME/jars/minlog-1.3.0.jar $HIVE_HOME/lib/minlog-1.3.0.jar
ln -s $SPARK_HOME/jars/scala-xml_2.11-1.0.5.jar $HIVE_HOME/lib/scala-xml_2.11-1.0.5.jar
ln -s $SPARK_HOME/jars/spark-unsafe_2.11-2.3.1.jar $HIVE_HOME/lib/spark-unsafe_2.11-2.3.1.jar
ln -s $SPARK_HOME/jars/xbean-asm5-shaded-4.4.jar $HIVE_HOME/lib/xbean-asm5-shaded-4.4.jar
FIX old netty in Hadoop 2.7:
Copy netty-all from
$SPARK_HOME/share
to:
/store/hadoop-2.7.1/share/hadoop/common/lib
mkdir /var/log/spark
bin/spark-shell --master spark://lizard77x:7077
sudo $HADOOP_HOME/bin/hadoop fs -mkdir /taxi_dat hadoop fs -copyFromLocal /store/taxi.dat /taxi_dat/
bin/spark-shell --master spark://lizard77x:7077 --driver-memory 10G --executor-memory 15G val lines = sc.textFile("/taxi_dat/taxi.dat") lines.count
Hive on Spark:
- not working with ceph
- works with Hadoop, only the matter of replacing with correct Hadoop instance in spark
pip3 install --user -U jupyterlab jupyter toree install --user --replace --spark_home=$SPARK_HOME --kernel_name="spark-toree" --spark_opts="--master=spark://lizard77x:7077"
RADOS GW: "user": "michal", "access_key": "C8XTR17Z7MUHUUMF8105", "secret_key": "VgTQMsEgh990uLBG0im3BWN21Q78sMjYc4icD5Bx"