Last active
May 29, 2023 15:53
-
-
Save kirisakow/2f6ef957673df6dcbc20bcdaa33c202a to your computer and use it in GitHub Desktop.
Run Spark MLlib and Scala in Google Colab with Almond
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [], | |
"collapsed_sections": [], | |
"toc_visible": true, | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"display_name": "Scala", | |
"name": "scala" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/kirisakow/2f6ef957673df6dcbc20bcdaa33c202a/run_spark_mllib_scala_in_colab_with_almond.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Run Spark MLlib and Scala in Google Colab with Almond", | |
"\n\n", | |
"### <u>**Deprecation warning:**</u> Google Colab interface has seemingly undergone changes and does not allow to use side kernels the way it used to be. I myself have stopped using Google Colab and have been using Docker images and containers instead. To run Scala in Jupyter Notebook as a Docker container, you can use this [guide of mine](https://github.com/kirisakow/scala-jupyter-container) based on `jupyter/all-spark-notebook` Docker image, the latest Almond and Scala. Happy coding!" | |
], | |
"metadata": { | |
"id": "tnRm0YwmdLhl" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Important prerequisite 1 / 4\n", | |
"\n", | |
"Open your Colab Notebook with a text editor and make sure the `kernelspec` key is set to work with Scala, like so:\n", | |
"\n", | |
"```json\n", | |
"{\n", | |
" ⋮\n", | |
" \"kernelspec\": {\n", | |
" \"display_name\": \"Scala\",\n", | |
" \"name\": \"scala\"\n", | |
" }\n", | |
" ⋮\n", | |
"}\n", | |
"```" | |
], | |
"metadata": { | |
"id": "UMudsO4-dQ03" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "QVJoUDPtb9gX" | |
}, | |
"source": [ | |
"## Important prerequisite 2 / 4\n", | |
"\n", | |
"Run the cell below to [install the Almond kernel](https://almond.sh/docs/quick-start-install) into the global Jupyter kernels:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"! curl -sS -Lo coursier https://git.io/coursier-cli\n", | |
"! chmod +x coursier\n", | |
"SCALA_VERSION=\"2.12.8\"\n", | |
"ALMOND_VERSION=\"0.3.1\"\n", | |
"! ./coursier bootstrap -r jitpack -i user -I user:sh.almond:scala-kernel-api_$SCALA_VERSION:$ALMOND_VERSION sh.almond:scala-kernel_$SCALA_VERSION:$ALMOND_VERSION -o almond 1>/dev/null 2>&1\n", | |
"! ./almond --install 1>/dev/null \n", | |
"! rm -f ./coursier ./almond" | |
], | |
"metadata": { | |
"id": "j-1b2BcOm6py" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Important prerequisite 3 / 4\n", | |
"\n", | |
"Reload Google Colab page for Scala to activate." | |
], | |
"metadata": { | |
"id": "wyH-FiPgxfIL" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Now you can work in Scala:" | |
], | |
"metadata": { | |
"id": "5hDyl5WedYRK" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"println(scala.util.Properties.versionString)" | |
], | |
"metadata": { | |
"id": "N8XpeKoGnqWJ", | |
"outputId": "e4fc5ea1-0f92-4608-a1d7-4993a21fa419", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"version 2.12.8\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Important prerequisite 4 / 4\n", | |
"\n", | |
"Download dependencies" | |
], | |
"metadata": { | |
"id": "ivoZNETEXxwy" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import $ivy.`sh.almond::almond-spark:0.3.0`\n", | |
"import $ivy.`org.apache.spark::spark-sql:2.4.0`\n", | |
"import $ivy.`org.apache.spark::spark-mllib:2.4.0`" | |
], | |
"metadata": { | |
"id": "5Dawce-sDcZB" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import org.apache.log4j.{Level, Logger}\n", | |
"\n", | |
"Logger.getLogger(\"org\").setLevel(Level.OFF)" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "Bbwh-nrQE7cP", | |
"outputId": "2289b274-0357-4973-ddf6-3ad44a5346ef" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"\u001b[32mimport \u001b[39m\u001b[36morg.apache.log4j.{Level, Logger}\n", | |
"\n", | |
"\u001b[39m" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 3 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Initialize SparkSession instance:" | |
], | |
"metadata": { | |
"id": "L_8OZ3r7YB8f" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import org.apache.spark.sql._\n", | |
"\n", | |
"val spark = {\n", | |
" NotebookSparkSession.builder()\n", | |
" .master(\"local[*]\")\n", | |
" .config(\"spark.ui.port\", \"4050\")\n", | |
" .getOrCreate()\n", | |
"}" | |
], | |
"metadata": { | |
"id": "kgZa-oheE_P3" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Make a dummy dataset:" | |
], | |
"metadata": { | |
"id": "3t0rnZQQYHs6" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import spark.implicits._\n", | |
"\n", | |
"val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))\n", | |
"val ds = spark.createDataset(data)\n", | |
"ds.show()" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "U7u5ztb_GHCG", | |
"outputId": "4e7795a6-a90a-4672-894c-f59cbcafc74c" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"+---+---+---+\n", | |
"| _1| _2| _3|\n", | |
"+---+---+---+\n", | |
"| 1| 2| 3|\n", | |
"| 4| 5| 6|\n", | |
"| 6| 7| 8|\n", | |
"| 9| 19| 10|\n", | |
"+---+---+---+\n", | |
"\n" | |
] | |
}, | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"\u001b[32mimport \u001b[39m\u001b[36mspark.implicits._\n", | |
"\n", | |
"\u001b[39m\n", | |
"\u001b[36mdata\u001b[39m: \u001b[32mSeq\u001b[39m[(\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m)] = \u001b[33mList\u001b[39m((\u001b[32m1\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m3\u001b[39m), (\u001b[32m4\u001b[39m, \u001b[32m5\u001b[39m, \u001b[32m6\u001b[39m), (\u001b[32m6\u001b[39m, \u001b[32m7\u001b[39m, \u001b[32m8\u001b[39m), (\u001b[32m9\u001b[39m, \u001b[32m19\u001b[39m, \u001b[32m10\u001b[39m))\n", | |
"\u001b[36mds\u001b[39m: \u001b[32mDataset\u001b[39m[(\u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m, \u001b[32mInt\u001b[39m)] = [_1: int, _2: int ... 1 more field]" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 5 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Retrieve a remote dataset:" | |
], | |
"metadata": { | |
"id": "UwLXm8t6YNoD" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import org.apache.spark.SparkFiles\n", | |
"\n", | |
"spark.sparkContext.addFile(\n", | |
" \"https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_libsvm_data.txt\"\n", | |
")" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "1GpokHaQVj1v", | |
"outputId": "dbdb4f11-7a35-4f02-93ab-8d7a6287dc52" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark.SparkFiles\n", | |
"\n", | |
"\u001b[39m" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 11 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Do a binomial logistic regression:" | |
], | |
"metadata": { | |
"id": "7o11CiK7YYFp" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import org.apache.spark.ml.classification.LogisticRegression\n", | |
"\n", | |
"// Load training data\n", | |
"val training = spark.read.format(\"libsvm\").load(SparkFiles.get(\"sample_libsvm_data.txt\"))\n", | |
"\n", | |
"val lr = new LogisticRegression()\n", | |
" .setMaxIter(10)\n", | |
" .setRegParam(0.3)\n", | |
" .setElasticNetParam(0.8)\n", | |
"\n", | |
"// Fit the model\n", | |
"val lrModel = lr.fit(training)" | |
], | |
"metadata": { | |
"id": "6ag_AB6gRCxW" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"// Print the coefficients and intercept for logistic regression\n", | |
"println(s\"Intercept: ${lrModel.intercept}\")\n", | |
"println(s\"Coefficients: ${lrModel.coefficients}\")" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "1N2UiD3yWJ3o", | |
"outputId": "125ce63c-9af6-45c9-da5e-392f82bc54e1" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Intercept: 0.22456315961250325\n", | |
"Coefficients: (692,[244,263,272,300,301,328,350,351,378,379,405,406,407,428,433,434,455,456,461,462,483,484,489,490,496,511,512,517,539,540,568],[-7.353983524188197E-5,-9.102738505589466E-5,-1.9467430546904298E-4,-2.0300642473486668E-4,-3.1476183314863995E-5,-6.842977602660743E-5,1.5883626898239883E-5,1.4023497091372047E-5,3.5432047524968605E-4,1.1443272898171087E-4,1.0016712383666666E-4,6.014109303795481E-4,2.840248179122762E-4,-1.1541084736508837E-4,3.85996886312906E-4,6.35019557424107E-4,-1.1506412384575676E-4,-1.5271865864986808E-4,2.804933808994214E-4,6.070117471191634E-4,-2.008459663247437E-4,-1.421075579290126E-4,2.739010341160883E-4,2.7730456244968115E-4,-9.838027027269332E-5,-3.808522443517704E-4,-2.5315198008555033E-4,2.7747714770754307E-4,-2.443619763919199E-4,-0.0015394744687597765,-2.3073328411331293E-4])\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Do a multinomial logistic regression:" | |
], | |
"metadata": { | |
"id": "J6DSqYlQYgC8" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"// We can also use the multinomial family for binary classification\n", | |
"val mlr = new LogisticRegression()\n", | |
" .setMaxIter(10)\n", | |
" .setRegParam(0.3)\n", | |
" .setElasticNetParam(0.8)\n", | |
" .setFamily(\"multinomial\")\n", | |
"\n", | |
"val mlrModel = mlr.fit(training)" | |
], | |
"metadata": { | |
"id": "5I0YikbXWVqS" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"// Print the coefficients and intercepts for logistic regression with multinomial family\n", | |
"println(s\"Multinomial intercepts: ${mlrModel.interceptVector}\")\n", | |
"println(s\"Multinomial coefficients: ${mlrModel.coefficientMatrix}\")" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "yWoW1vYQWeJW", | |
"outputId": "63014263-aa61-4c4b-b042-b04dae8bddcc" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Multinomial intercepts: [-0.12065879445860686,0.12065879445860686]\n", | |
"Multinomial coefficients: 2 x 692 CSCMatrix\n", | |
"(0,244) 4.290365458958277E-5\n", | |
"(1,244) -4.290365458958294E-5\n", | |
"(0,263) 6.488313287833108E-5\n", | |
"(1,263) -6.488313287833092E-5\n", | |
"(0,272) 1.2140666790834663E-4\n", | |
"(1,272) -1.2140666790834657E-4\n", | |
"(0,300) 1.3231861518665612E-4\n", | |
"(1,300) -1.3231861518665607E-4\n", | |
"(0,350) -6.775444746760509E-7\n", | |
"(1,350) 6.775444746761932E-7\n", | |
"(0,351) -4.899237909429297E-7\n", | |
"(1,351) 4.899237909430322E-7\n", | |
"(0,378) -3.5812102770679596E-5\n", | |
"(1,378) 3.581210277067968E-5\n", | |
"(0,379) -2.3539704331222065E-5\n", | |
"(1,379) 2.353970433122204E-5\n", | |
"(0,405) -1.90295199030314E-5\n", | |
"(1,405) 1.90295199030314E-5\n", | |
"(0,406) -5.626696935778909E-4\n", | |
"(1,406) 5.626696935778912E-4\n", | |
"(0,407) -5.121519619099504E-5\n", | |
"(1,407) 5.1215196190995074E-5\n", | |
"(0,428) 8.080614545413342E-5\n", | |
"(1,428) -8.080614545413331E-5\n", | |
"(0,433) -4.256734915330487E-5\n", | |
"(1,433) 4.256734915330495E-5\n", | |
"(0,434) -7.080191510151425E-4\n", | |
"(1,434) 7.080191510151435E-4\n", | |
"(0,455) 8.094482475733589E-5\n", | |
"(1,455) -8.094482475733582E-5\n", | |
"(0,456) 1.0433687128309833E-4\n", | |
"(1,456) -1.0433687128309814E-4\n", | |
"(0,461) -5.4466605046259246E-5\n", | |
"(1,461) 5.4466605046259286E-5\n", | |
"(0,462) -5.667133061990392E-4\n", | |
"(1,462) 5.667133061990392E-4\n", | |
"(0,483) 1.2495896045528374E-4\n", | |
"(1,483) -1.249589604552838E-4\n", | |
"(0,484) 9.810519424784944E-5\n", | |
"(1,484) -9.810519424784941E-5\n", | |
"(0,489) -4.88440907254626E-5\n", | |
"(1,489) 4.8844090725462606E-5\n", | |
"(0,490) -4.324392733454803E-5\n", | |
"(1,490) 4.324392733454811E-5\n", | |
"(0,496) 6.903351855620161E-5\n", | |
"(1,496) -6.90335185562012E-5\n", | |
"(0,511) 3.946505594172827E-4\n", | |
"(1,511) -3.946505594172831E-4\n", | |
"(0,512) 2.621745995919226E-4\n", | |
"(1,512) -2.621745995919226E-4\n", | |
"(0,517) -4.459475951170906E-5\n", | |
"(1,517) 4.459475951170901E-5\n", | |
"(0,539) 2.5417562428184555E-4\n", | |
"(1,539) -2.5417562428184555E-4\n", | |
"(0,540) 5.271781246228031E-4\n", | |
"(1,540) -5.271781246228032E-4\n", | |
"(0,568) 1.860255150352447E-4\n", | |
"(1,568) -1.8602551503524485E-4\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Do an example of a simple ML Pipeline over a natural language dummy dataset:" | |
], | |
"metadata": { | |
"id": "SN4c8ylcbFK0" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import org.apache.spark.ml.{Pipeline, PipelineModel}\n", | |
"import org.apache.spark.ml.classification.LogisticRegression\n", | |
"import org.apache.spark.ml.feature.{HashingTF, Tokenizer}\n", | |
"import org.apache.spark.ml.linalg.Vector\n", | |
"import org.apache.spark.sql.Row\n", | |
"\n", | |
"// Prepare training documents from a list of (id, text, label) tuples.\n", | |
"val training = spark.createDataFrame(Seq(\n", | |
" (0L, \"a b c d e spark\", 1.0),\n", | |
" (1L, \"b d\", 0.0),\n", | |
" (2L, \"spark f g h\", 1.0),\n", | |
" (3L, \"hadoop mapreduce\", 0.0)\n", | |
")).toDF(\"id\", \"text\", \"label\")\n", | |
"\n", | |
"// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.\n", | |
"val tokenizer = new Tokenizer()\n", | |
" .setInputCol(\"text\")\n", | |
" .setOutputCol(\"words\")\n", | |
"val hashingTF = new HashingTF()\n", | |
" .setNumFeatures(1000)\n", | |
" .setInputCol(tokenizer.getOutputCol)\n", | |
" .setOutputCol(\"features\")\n", | |
"val lr = new LogisticRegression()\n", | |
" .setMaxIter(10)\n", | |
" .setRegParam(0.001)\n", | |
"val pipeline = new Pipeline()\n", | |
" .setStages(Array(tokenizer, hashingTF, lr))\n", | |
"\n", | |
"// Fit the pipeline to training documents.\n", | |
"val model = pipeline.fit(training)" | |
], | |
"metadata": { | |
"id": "Nj6nTB1LZx3B" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"// Now we can optionally save the fitted pipeline to disk\n", | |
"model.write.overwrite().save(\"/tmp/spark-logistic-regression-model\")\n", | |
"\n", | |
"// We can also save this unfit pipeline to disk\n", | |
"pipeline.write.overwrite().save(\"/tmp/unfit-lr-model\")\n", | |
"\n", | |
"// And load it back in during production\n", | |
"val sameModel = PipelineModel.load(\"/tmp/spark-logistic-regression-model\")\n", | |
"\n", | |
"// Prepare test documents, which are unlabeled (id, text) tuples.\n", | |
"val test = spark.createDataFrame(Seq(\n", | |
" (4L, \"spark i j k\"),\n", | |
" (5L, \"l m n\"),\n", | |
" (6L, \"spark hadoop spark\"),\n", | |
" (7L, \"apache hadoop\")\n", | |
")).toDF(\"id\", \"text\")\n", | |
"\n", | |
"// Make predictions on test documents.\n", | |
"model.transform(test)\n", | |
" .select(\"id\", \"text\", \"probability\", \"prediction\")\n", | |
" .collect()\n", | |
" .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>\n", | |
" println(s\"($id, $text) --> prob=$prob, prediction=$prediction\")\n", | |
" }" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 781 | |
}, | |
"id": "rrIgES8LZ-MS", | |
"outputId": "773ad554-751e-4124-bb50-4922a08ff134" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(150);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(151);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(152);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(153);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">parquet at LogisticRegression.scala:1241</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(154);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">parquet at LogisticRegression.scala:1241</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(155);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(156);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(157);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(158);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">runJob at SparkHadoopWriter.scala:78</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(159);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(160);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(161);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(162);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(163);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(164);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(165);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">first at ReadWrite.scala:615</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(166);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">load at LogisticRegression.scala:1255</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(167);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
" <span style=\"float: left; word-wrap: normal; white-space: nowrap; text-align: center\">head at LogisticRegression.scala:1273</span>\n", | |
" <span style=\"float: right; word-wrap: normal; white-space: nowrap; text-align: center\"><a href=\"#\" onclick=\"cancelStage(168);\">(kill)</a></span>\n", | |
"</div>\n", | |
"<br>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/html": [ | |
"<div class=\"progress\">\n", | |
" <div class=\"progress-bar bg-success\" role=\"progressbar\" style=\"width: 0%; word-wrap: normal; white-space: nowrap; text-align: center; color: white\" aria-valuenow=\"0\" aria-valuemin=\"0\" aria-valuemax=\"100\">\n", | |
" 0 / 1\n", | |
" </div>\n", | |
"</div>\n" | |
] | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"(4, spark i j k) --> prob=[0.15964077387874118,0.8403592261212589], prediction=1.0\n", | |
"(5, l m n) --> prob=[0.8378325685476612,0.16216743145233875], prediction=0.0\n", | |
"(6, spark hadoop spark) --> prob=[0.06926633132976273,0.9307336686702373], prediction=1.0\n", | |
"(7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], prediction=0.0\n" | |
] | |
}, | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"\u001b[36msameModel\u001b[39m: \u001b[32mPipelineModel\u001b[39m = pipeline_33376d963408\n", | |
"\u001b[36mtest\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: bigint, text: string]" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 23 | |
} | |
] | |
} | |
] | |
} |
You need to reload the Google Colab page after you installed coursier and almond
To @sunnysea @Platinum-Dragon and anyone who is reading this:
Google Colab interface has seemingly undergone changes and does not allow to use side kernels the way it used to be.
I myself have stopped using Google Colab and have been using Docker images and containers instead.
To run Scala in Jupyter Notebook as a Docker container, you can use this guide of mine based on jupyter/all-spark-notebook
Docker image, the latest Almond and Scala.
Happy coding!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I followed prerequisites 1 & 2 exactly, but on #3 I get the same thing: