Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kirisakow/2f6ef957673df6dcbc20bcdaa33c202a to your computer and use it in GitHub Desktop.
Save kirisakow/2f6ef957673df6dcbc20bcdaa33c202a to your computer and use it in GitHub Desktop.
Run Spark MLlib and Scala in Google Colab with Almond
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"collapsed_sections": [],
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
"display_name": "Scala",
"name": "scala"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/kirisakow/2f6ef957673df6dcbc20bcdaa33c202a/run_spark_mllib_scala_in_colab_with_almond.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Run Spark MLlib and Scala in Google Colab with Almond",
"\n\n",
"### <u>**Deprecation warning:**</u> Google Colab interface has seemingly undergone changes and does not allow to use side kernels the way it used to be. I myself have stopped using Google Colab and have been using Docker images and containers instead. To run Scala in Jupyter Notebook as a Docker container, you can use this [guide of mine](https://github.com/kirisakow/scala-jupyter-container) based on `jupyter/all-spark-notebook` Docker image, the latest Almond and Scala. Happy coding!"
],
"metadata": {
"id": "tnRm0YwmdLhl"
}
},
{
"cell_type": "markdown",
"source": [
"## Important prerequisite 1 / 4\n",
"\n",
"Open your Colab Notebook with a text editor and make sure the `kernelspec` key is set to work with Scala, like so:\n",
"\n",
"```json\n",
"{\n",
" ⋮\n",
" \"kernelspec\": {\n",
" \"display_name\": \"Scala\",\n",
" \"name\": \"scala\"\n",
" }\n",
" ⋮\n",
"}\n",
"```"
],
"metadata": {
"id": "UMudsO4-dQ03"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "QVJoUDPtb9gX"
},
"source": [
"## Important prerequisite 2 / 4\n",
"\n",
"Run the cell below to [install the Almond kernel](https://almond.sh/docs/quick-start-install) into the global Jupyter kernels:"
]
},
{
"cell_type": "code",
"source": [
"! curl -sS -Lo coursier https://git.io/coursier-cli\n",
"! chmod +x coursier\n",
"SCALA_VERSION=\"2.12.8\"\n",
"ALMOND_VERSION=\"0.3.1\"\n",
"! ./coursier bootstrap -r jitpack -i user -I user:sh.almond:scala-kernel-api_$SCALA_VERSION:$ALMOND_VERSION sh.almond:scala-kernel_$SCALA_VERSION:$ALMOND_VERSION -o almond 1>/dev/null 2>&1\n",
"! ./almond --install 1>/dev/null \n",
"! rm -f ./coursier ./almond"
],
"metadata": {
"id": "j-1b2BcOm6py"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Important prerequisite 3 / 4\n",
"\n",
"Reload Google Colab page for Scala to activate."
],
"metadata": {
"id": "wyH-FiPgxfIL"
}
},
{
"cell_type": "markdown",
"source": [
"Now you can work in Scala:"
],
"metadata": {
"id": "5hDyl5WedYRK"
}
},
{
"cell_type": "code",
"source": [
"println(scala.util.Properties.versionString)"
],
"metadata": {
"id": "N8XpeKoGnqWJ",
"outputId": "e4fc5ea1-0f92-4608-a1d7-4993a21fa419",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"version 2.12.8\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Important prerequisite 4 / 4\n",
"\n",
"Download dependencies"
],
"metadata": {
"id": "ivoZNETEXxwy"
}
},
{
"cell_type": "code",
"source": [
"import $ivy.`sh.almond::almond-spark:0.3.0`\n",
"import $ivy.`org.apache.spark::spark-sql:2.4.0`\n",
"import $ivy.`org.apache.spark::spark-mllib:2.4.0`"
],
"metadata": {
"id": "5Dawce-sDcZB"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import org.apache.log4j.{Level, Logger}\n",
"\n",
"Logger.getLogger(\"org\").setLevel(Level.OFF)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Bbwh-nrQE7cP",
"outputId": "2289b274-0357-4973-ddf6-3ad44a5346ef"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\u001b[32mimport \u001b[39m\u001b[36morg.apache.log4j.{Level, Logger}\n",
"\n",
"\u001b[39m"
]
},
"metadata": {},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"source": [
"Initialize SparkSession instance:"
],
"metadata": {
"id": "L_8OZ3r7YB8f"
}
},
{
"cell_type": "code",
"source": [
"import org.apache.spark.sql._\n",
"\n",
"val spark = {\n",
" NotebookSparkSession.builder()\n",
" .master(\"local[*]\")\n",
" .config(\"spark.ui.port\", \"4050\")\n",
" .getOrCreate()\n",
"}"
],
"metadata": {
"id": "kgZa-oheE_P3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Make a dummy dataset:"
],
"metadata": {
"id": "3t0rnZQQYHs6"
}
},
{
"cell_type": "code",
"source": [
"import spark.implicits._\n",
"\n",
"val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))\n",
"val ds = spark.createDataset(data)\n",
"ds.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "U7u5ztb_GHCG",
"outputId": "4e7795a6-a90a-4672-894c-f59cbcafc74c"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"+---+---+---+\n",
"| _1| _2| _3|\n",
"+---+---+---+\n",
"| 1| 2| 3|\n",
"| 4| 5| 6|\n",
"| 6| 7| 8|\n",
"| 9| 19| 10|\n",
"+---+---+---+\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\u001b[32mimport \u001b[39m\u001b[36mspark.implicits._\n",
"\n",
"\u001b[39m\n",
"\u001b[36mdata\u001b[39m: \u001b[32mSeq\u001b[39m[(\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m)] = \u001b[33mList\u001b[39m((\u001b[32m1\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m3\u001b[39m), (\u001b[32m4\u001b[39m, \u001b[32m5\u001b[39m, \u001b[32m6\u001b[39m), (\u001b[32m6\u001b[39m, \u001b[32m7\u001b[39m, \u001b[32m8\u001b[39m), (\u001b[32m9\u001b[39m, \u001b[32m19\u001b[39m, \u001b[32m10\u001b[39m))\n",
"\u001b[36mds\u001b[39m: \u001b[32mDataset\u001b[39m[(\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m)] = [_1: int, _2: int ... 1 more field]"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"source": [
"Retrieve a remote dataset:"
],
"metadata": {
"id": "UwLXm8t6YNoD"
}
},
{
"cell_type": "code",
"source": [
"import org.apache.spark.SparkFiles\n",
"\n",
"spark.sparkContext.addFile(\n",
" \"https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_libsvm_data.txt\"\n",
")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1GpokHaQVj1v",
"outputId": "dbdb4f11-7a35-4f02-93ab-8d7a6287dc52"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark.SparkFiles\n",
"\n",
"\u001b[39m"
]
},
"metadata": {},
"execution_count": 11
}
]
},
{
"cell_type": "markdown",
"source": [
"Do a binomial logistic regression:"
],
"metadata": {
"id": "7o11CiK7YYFp"
}
},
{
"cell_type": "code",
"source": [
"import org.apache.spark.ml.classification.LogisticRegression\n",
"\n",
"// Load training data\n",
"val training = spark.read.format(\"libsvm\").load(SparkFiles.get(\"sample_libsvm_data.txt\"))\n",
"\n",
"val lr = new LogisticRegression()\n",
" .setMaxIter(10)\n",
" .setRegParam(0.3)\n",
" .setElasticNetParam(0.8)\n",
"\n",
"// Fit the model\n",
"val lrModel = lr.fit(training)"
],
"metadata": {
"id": "6ag_AB6gRCxW"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"// Print the coefficients and intercept for logistic regression\n",
"println(s\"Intercept: ${lrModel.intercept}\")\n",
"println(s\"Coefficients: ${lrModel.coefficients}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1N2UiD3yWJ3o",
"outputId": "125ce63c-9af6-45c9-da5e-392f82bc54e1"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Intercept: 0.22456315961250325\n",
"Coefficients: (692,[244,263,272,300,301,328,350,351,378,379,405,406,407,428,433,434,455,456,461,462,483,484,489,490,496,511,512,517,539,540,568],[-7.353983524188197E-5,-9.102738505589466E-5,-1.9467430546904298E-4,-2.0300642473486668E-4,-3.1476183314863995E-5,-6.842977602660743E-5,1.5883626898239883E-5,1.4023497091372047E-5,3.5432047524968605E-4,1.1443272898171087E-4,1.0016712383666666E-4,6.014109303795481E-4,2.840248179122762E-4,-1.1541084736508837E-4,3.85996886312906E-4,6.35019557424107E-4,-1.1506412384575676E-4,-1.5271865864986808E-4,2.804933808994214E-4,6.070117471191634E-4,-2.008459663247437E-4,-1.421075579290126E-4,2.739010341160883E-4,2.7730456244968115E-4,-9.838027027269332E-5,-3.808522443517704E-4,-2.5315198008555033E-4,2.7747714770754307E-4,-2.443619763919199E-4,-0.0015394744687597765,-2.3073328411331293E-4])\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Do a multinomial logistic regression:"
],
"metadata": {
"id": "J6DSqYlQYgC8"
}
},
{
"cell_type": "code",
"source": [
"// We can also use the multinomial family for binary classification\n",
"val mlr = new LogisticRegression()\n",
" .setMaxIter(10)\n",
" .setRegParam(0.3)\n",
" .setElasticNetParam(0.8)\n",
" .setFamily(\"multinomial\")\n",
"\n",
"val mlrModel = mlr.fit(training)"
],
"metadata": {
"id": "5I0YikbXWVqS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"// Print the coefficients and intercepts for logistic regression with multinomial family\n",
"println(s\"Multinomial intercepts: ${mlrModel.interceptVector}\")\n",
"println(s\"Multinomial coefficients: ${mlrModel.coefficientMatrix}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yWoW1vYQWeJW",
"outputId": "63014263-aa61-4c4b-b042-b04dae8bddcc"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Multinomial intercepts: [-0.12065879445860686,0.12065879445860686]\n",
"Multinomial coefficients: 2 x 692 CSCMatrix\n",
"(0,244) 4.290365458958277E-5\n",
"(1,244) -4.290365458958294E-5\n",
"(0,263) 6.488313287833108E-5\n",
"(1,263) -6.488313287833092E-5\n",
"(0,272) 1.2140666790834663E-4\n",
"(1,272) -1.2140666790834657E-4\n",
"(0,300) 1.3231861518665612E-4\n",
"(1,300) -1.3231861518665607E-4\n",
"(0,350) -6.775444746760509E-7\n",
"(1,350) 6.775444746761932E-7\n",
"(0,351) -4.899237909429297E-7\n",
"(1,351) 4.899237909430322E-7\n",
"(0,378) -3.5812102770679596E-5\n",
"(1,378) 3.581210277067968E-5\n",
"(0,379) -2.3539704331222065E-5\n",
"(1,379) 2.353970433122204E-5\n",
"(0,405) -1.90295199030314E-5\n",
"(1,405) 1.90295199030314E-5\n",
"(0,406) -5.626696935778909E-4\n",
"(1,406) 5.626696935778912E-4\n",
"(0,407) -5.121519619099504E-5\n",
"(1,407) 5.1215196190995074E-5\n",
"(0,428) 8.080614545413342E-5\n",
"(1,428) -8.080614545413331E-5\n",
"(0,433) -4.256734915330487E-5\n",
"(1,433) 4.256734915330495E-5\n",
"(0,434) -7.080191510151425E-4\n",
"(1,434) 7.080191510151435E-4\n",
"(0,455) 8.094482475733589E-5\n",
"(1,455) -8.094482475733582E-5\n",
"(0,456) 1.0433687128309833E-4\n",
"(1,456) -1.0433687128309814E-4\n",
"(0,461) -5.4466605046259246E-5\n",
"(1,461) 5.4466605046259286E-5\n",
"(0,462) -5.667133061990392E-4\n",
"(1,462) 5.667133061990392E-4\n",
"(0,483) 1.2495896045528374E-4\n",
"(1,483) -1.249589604552838E-4\n",
"(0,484) 9.810519424784944E-5\n",
"(1,484) -9.810519424784941E-5\n",
"(0,489) -4.88440907254626E-5\n",
"(1,489) 4.8844090725462606E-5\n",
"(0,490) -4.324392733454803E-5\n",
"(1,490) 4.324392733454811E-5\n",
"(0,496) 6.903351855620161E-5\n",
"(1,496) -6.90335185562012E-5\n",
"(0,511) 3.946505594172827E-4\n",
"(1,511) -3.946505594172831E-4\n",
"(0,512) 2.621745995919226E-4\n",
"(1,512) -2.621745995919226E-4\n",
"(0,517) -4.459475951170906E-5\n",
"(1,517) 4.459475951170901E-5\n",
"(0,539) 2.5417562428184555E-4\n",
"(1,539) -2.5417562428184555E-4\n",
"(0,540) 5.271781246228031E-4\n",
"(1,540) -5.271781246228032E-4\n",
"(0,568) 1.860255150352447E-4\n",
"(1,568) -1.8602551503524485E-4\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Do an example of a simple ML Pipeline over a natural language dummy dataset:"
],
"metadata": {
"id": "SN4c8ylcbFK0"
}
},
{
"cell_type": "code",
"source": [
"import org.apache.spark.ml.{Pipeline, PipelineModel}\n",
"import org.apache.spark.ml.classification.LogisticRegression\n",
"import org.apache.spark.ml.feature.{HashingTF, Tokenizer}\n",
"import org.apache.spark.ml.linalg.Vector\n",
"import org.apache.spark.sql.Row\n",
"\n",
"// Prepare training documents from a list of (id, text, label) tuples.\n",
"val training = spark.createDataFrame(Seq(\n",
" (0L, \"a b c d e spark\", 1.0),\n",
" (1L, \"b d\", 0.0),\n",
" (2L, \"spark f g h\", 1.0),\n",
" (3L, \"hadoop mapreduce\", 0.0)\n",
")).toDF(\"id\", \"text\", \"label\")\n",
"\n",
"// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.\n",
"val tokenizer = new Tokenizer()\n",
" .setInputCol(\"text\")\n",
" .setOutputCol(\"words\")\n",
"val hashingTF = new HashingTF()\n",
" .setNumFeatures(1000)\n",
" .setInputCol(tokenizer.getOutputCol)\n",
" .setOutputCol(\"features\")\n",
"val lr = new LogisticRegression()\n",
" .setMaxIter(10)\n",
" .setRegParam(0.001)\n",
"val pipeline = new Pipeline()\n",
" .setStages(Array(tokenizer, hashingTF, lr))\n",
"\n",
"// Fit the pipeline to training documents.\n",
"val model = pipeline.fit(training)"
],
"metadata": {
"id": "Nj6nTB1LZx3B"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"// Now we can optionally save the fitted pipeline to disk\n",
"model.write.overwrite().save(\"/tmp/spark-logistic-regression-model\")\n",
"\n",
"// We can also save this unfit pipeline to disk\n",
"pipeline.write.overwrite().save(\"/tmp/unfit-lr-model\")\n",
"\n",
"// And load it back in during production\n",
"val sameModel = PipelineModel.load(\"/tmp/spark-logistic-regression-model\")\n",
"\n",
"// Prepare test documents, which are unlabeled (id, text) tuples.\n",
"val test = spark.createDataFrame(Seq(\n",
" (4L, \"spark i j k\"),\n",
" (5L, \"l m n\"),\n",
" (6L, \"spark hadoop spark\"),\n",
" (7L, \"apache hadoop\")\n",
")).toDF(\"id\", \"text\")\n",
"\n",
"// Make predictions on test documents.\n",
"model.transform(test)\n",
" .select(\"id\", \"text\", \"probability\", \"prediction\")\n",
" .collect()\n",
" .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>\n",
" println(s\"($id, $text) --> prob=$prob, prediction=$prediction\")\n",
" }"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 781
},
"id": "rrIgES8LZ-MS",
"outputId": "773ad554-751e-4124-bb50-4922a08ff134"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(150);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(151);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(152);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(153);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">parquet at LogisticRegression.scala:1241</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(154);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">parquet at LogisticRegression.scala:1241</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(155);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(156);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(157);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(158);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(159);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(160);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(161);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(162);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(163);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(164);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(165);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(166);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">load at LogisticRegression.scala:1255</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(167);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">head at LogisticRegression.scala:1273</span>\n",
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(168);\">(kill)</a></span>\n",
"</div>\n",
"<br>\n"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/html": [
"<div class=\"progress\">\n",
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n",
" 0 / 1\n",
" </div>\n",
"</div>\n"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"(4, spark i j k) --> prob=[0.15964077387874118,0.8403592261212589], prediction=1.0\n",
"(5, l m n) --> prob=[0.8378325685476612,0.16216743145233875], prediction=0.0\n",
"(6, spark hadoop spark) --> prob=[0.06926633132976273,0.9307336686702373], prediction=1.0\n",
"(7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], prediction=0.0\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\u001b[36msameModel\u001b[39m: \u001b[32mPipelineModel\u001b[39m = pipeline_33376d963408\n",
"\u001b[36mtest\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: bigint, text: string]"
]
},
"metadata": {},
"execution_count": 23
}
]
}
]
}
@Platinum-Dragon
Copy link

I followed prerequisites 1 & 2 exactly, but on #3 I get the same thing:

image

@kirisakow
Copy link
Author

You need to reload the Google Colab page after you installed coursier and almond

@sunnysea
Copy link

image

Hello! After 1&2 and reloading the page, I get this error. Could you help me what should I do in this case, please?

@kirisakow
Copy link
Author

To @sunnysea @Platinum-Dragon and anyone who is reading this:

Google Colab interface has seemingly undergone changes and does not allow to use side kernels the way it used to be.

I myself have stopped using Google Colab and have been using Docker images and containers instead.

To run Scala in Jupyter Notebook as a Docker container, you can use this guide of mine based on jupyter/all-spark-notebook Docker image, the latest Almond and Scala.

Happy coding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment