🤖 Summarize your notes with Joplin AI!

1. Introduction

This is part of Google's annual summer program, which allows people to be involved in open-source development. Before joining, I had one experience with open-source with MindsDB, where I integrated spaCy (Python NLP package) into their system as part of Hacktoberfest 2023. Eventually, I became one of the winners, motivating me to stay in open-source!

I then remembered that my friend back in college told me about Google Summer of Code! It could be a fantastic experience. Therefore, I decided to join the program! I mainly searched for projects that were AI-focused, but there were few. However, luckily, I bumped into Joplin, and they had an idea to create a summarised for notes in their note-taking app an idea to create a summarised for notes in their note-taking app! It instantly caught my attention and decided that this is the project I want to spend during my summertime.

I proposed using LLMs with Transformers.js and TextRank to summarise all notes and notebooks and highlight content in notes in the text editor. From the discussion with the community during the competition period, I decided to implement the feature in a plugin rather than in the core application to minimize risks, ensure modularity and isolation, and give users a free choice whether they want to use the AI feature.

1.1 Motivation

The project aims to create note summaries to help users synthesize main ideas and arguments to identify salient points. This means that users will have a clear idea of what the note is about in a short piece of text with less mental effort.

Example Use Cases:

Assist in processing notes to improve efficiency: Distill critical information from notes, highlight key ideas and quickly skim notes.
Classify or cluster notes by their contents: Summarize key concepts from notes and use them in similar group notes. This could be used for tagging notes.
Distill important information from long notes to empower solutions such as search, question, and answer.

1.2 Types of Summaries

There are two main types of summarization: extractive and abstractive

● Extractive summarisation: This method takes sentences directly from the original note, depending on their importance. The summary obtained contains exact sentences from the original text.

● Abstractive summarisation: Abstractive summarization is closer to what a human usually does — i.e., conceive the text, compare it with their memory and related information, and then re-create its core in a brief text.

Abstractive summarization tends to be more computationally expensive since you must utilize neural networks and generative systems. On the other hand, extractive summarization does not require the use of deep learning and data labeling [1].

2. Work Done

2.1 Unsupervised Machine Learning Methods

I started my coding period by researching NLP and implementing unsupervised machine learning methods (TextRank, LexRank, LSA, and KMeans Clustering) for extractive summarisation. Before applying any methods we need to preprocess the note content and do vectorisation. This is the usual flow:

Getting the note body from Joplin Plugin API
Tokenize sentences with natural: create an array of sentences from the note
Perform "vectorization"
- Understanding vectorization: in simple terms, it is a way to convert sentences into vector forms so that we can perform various algorithms. For example, in LSA, we create sentence vectors to form a matrix and then perform SVD to discover the most important dimensions by getting the diagonal matrix.
  - With those dimensions, we can determine which sentences are the most important: "Salient and recurring word patterns are likely to be captured and represented by a singular vector. The corresponding eigen value indicates the degree of importance of the pattern. Sentences containing this pattern will be projected along this vector and the sentence that best represents this pattern will have the largest component along this vector" [8].
- Vectorization methods:
  - Binary Matrix -> converting sentences into binary vectors
  - TF-IDF -> convert sentences based on the frequency and importance of the words in the sentence
  - Word2Vec -> create word embeddings [3] - good for finding out semantic relationships between words
Apply unsupervised machine learning algorithms to vectors.
- KMeans Clustering example:
  - [STEP 1] Select random k (those will be centroids) →
  - [STEP 2] Create k clusters and start clustering sentence vectors →
  - [STEP 3] Run until convergence →
  - [STEP 4] The most important sentences will be closest to centroids →
  - [STEP 5] Take either k sentences or m sentences that are closest to k centroids to include them in the final summary
You take n sentences from the result

2.1.1 Evaluation of Different Methods

Algorithms	Description	Weakness	Link
TextRank	TextRank is a graph-based ranking algorithm inspired by PageRank. It connects words or sentences based on how frequently they appear near each other in the text and uses the number of shared words between sentences to establish similarity.	May not capture complex relationships between sentences accurately.	https://blogs.cornell.edu/info2040/2018/10/22/40068/
LexRank	LexRank is similar to TextRank but uses cosine similarity of TF-IDF vectors (sentence vectors) and is more tailored towards the extraction of information from multiple texts written about the same topic.	The algorithm may not perform well on a set of unclustered/unrelated set of documents	http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html
LSA	LSA creates a term-sentence matrix (frequency of words within sentences of the document then applies SVD (Single-Value Decomposition) to learn about relationships between words and sentences.	Struggles with polysemy and synonyms	Latent Semantic Analysis
KMeans Clustering	KMeans Clustering group sentence vectors into different clusters. Sentence vectors that are closest to cluster centroids are included in summaries.	Figuring out the best pre-defined k value for training	KMeans Clustering with TF-IDF and KMeans Clustering with word2vec

2.2 LLMs

2.2.1 ONNX (Open Neural Network Exchange)

ONNX Runtime is a cross-platform machine-learning model accelerator with a flexible interface to integrate hardware-specific libraries and can be used with models from PyTorch, Tensorflow/Keras, TFLite, scikit-learn, and other frameworks. The ONNX enables us to run ML models in web browsers.

2.2.2 Transformers.js

Transformers.js is a new state-of-the-art machine-learning library by HuggingFace for the web. With this library, we can run pre-trained transformer models or our custom ML models in browsers! We do not have to do custom training for abstractive summarization since we will use their pre-trained models tailored for summarization. I will test and benchmark multiple models based on summary quality and inference time. The best model I found so far is Google/flan-t5-small (±60MB); in the future, I would like to use Google/flan-t5-base (±200MB) instead since it performs much better but has more extended inference and takes more memory. If it cannot upload to NPM, I need to download Google/flan-t5-base when the users install the plugin.

ML model	Link
facebook/bart-large-cnn	https://huggingface.co/facebook/bart-large-cnn
sshleifer/distilbart-cnn-12-6	https://huggingface.co/sshleifer/distilbart-cnn-12-6
google/pegasus-xsum	https://huggingface.co/google/pegasus-xsum
Google/flan-t5-small	https://huggingface.co/google/flan-t5-small
MBZUAI/LaMini-Flan-T5-248M	https://huggingface.co/MBZUAI/LaMini-Flan-T5-248M
sshleifer/distilbart-xsum-6-6	https://huggingface.co/sshleifer/distilbart-xsum-6-6

I initially had problems with Transformers.js to make it work in the Joplin plugin. The main issue was that when it is bundled with Webpack, it needed node-loader. When running the app, node-loader could not find the .node files in the dist folder. To not depend on the node-loader, a mentor (Laurent Cozic) recommended using web workers instead since they came into a similar problem with Tesseract.js to deal with .wasm in a plugin, and they solved the problem by running the web worker and loading the package in the app. Therefore, I created a tech spec about creating a generic web worker that future contributors and developers would use and bring benefits such as running computations in the background, preventing the application's main thread, or running packages that have problems when bundled with Webpack.

That proved to be more effort and out of the project's scope. However, luckily, another member of the community suggested a solution of running it in a plugin by downloading and loading ONNX .wasm files locally, running the code in a web environment, and setting it in the webpack.config.js. With a few more changes, I made it run in the plugin! The disadvantage of this approach is that it could not handle some of the cases, and developers have to be experienced in Webpack and set the configuration in webpack.config.js themselves. Furthermore, if developers update the webpack.config.js, all configurations will be lost. Therefore, the generic web worker would be more desirable but more challenging to implement.

2.3 UI functionalities

I created a survey to find out which unsupervised machine learning algorithm performed the best in terms of the quality of summaries and the length of the summaries. It did not get many engagements, and I struggled to choose the best one.

Later, I came up with the idea to let users craft summaries! Basically, what happens is that there are options to choose the algorithm and the length of summaries. After that, users can edit the text area once the algorithm outputs the summaries. When users save the summaries, it redirects them to the summary note details page in the panel or creates a summary in the above current notes. In the panel, the summaries are exported into TipTap editor where they can freely edit and style text too! The plugin allows users to control and craft summaries, which I think is pretty cool and unique!

2.3.3 Panel

joplin-plugin-ai-summarisation-panel.mp4

2.3.4 Context Menus

joplin-plugin-ai-summarisation-context-menu.mp4

2.4 Flowchart

flowchart LR
   A[Opening Joplin]-.-> B[Using the Panel]
   A[Opening Joplin]-.-> C[Using Context Menus]
   C -.-> D[Click on the Notebook]
   C -.-> E[Click on the Note]
   E -.-> F[Right-click on the note]
   E -.-> G[Highlight multiple text in the note]
   F -.-> H[Summarize the note]
   G -.-> I[Right-click on the text]
   I -.-> J[Summarize the highlighted text]
   B -.-> K[Click on the note in the notebook tree]
   K -.-> L[Edit the summary, configure length and choose different algorithms]
   L -.-> M[Click save]
   M -.-> N[edit, change font-weight, etc.]
   D -.-> O[Right-click on the notebook]
   O -.-> P[Summarize the notebook]

3. Future Work

3.1 UI/UX

There are still things that could be added, such as editing the summary title, getting back to crafting configuration for already summarised notes, and more.

A nice member of the community gave me some feedback and suggestions for improvement. The feedback can be found here.

3.2 AI

Using word2vec might enhance the quality of summaries in unsupervised machine learning methods [4] since it captures semantic relationships between words, unlike TF-IDF, where it is only based on the frequency and importance of the words. Instead of KMeans Clustering, there would be an option to use hierarchical clutsering instead. The advantages of using those are that we do not have to define the k value and let it create several clusters. Some members of the community recommended using HBDSCAN. You can find the discussion here.

Another enhancement could be to apply dimensionality reduction on sentence embeddings from word2vec. That will help us to tackle the curse of dimensionality, the higher the dimensions of our data is, the more sparse are the sentence vectors in the space. Furthermopre in high dimensions, the difference in distances between data points tends to become negligible, making measures like Euclidean distance less meaningful. We can use dimensonality reduction techniques such as UMAP or tSNE.

In some cases, small LLMs do not perform well on some texts, especially news articles. To enhance this, we can fine-tune the model and train on the dataset tailored towards automatic text summarisation such as CNN/Daily Mail.

To make the inferenre faster, there is a possibility of using WebGPU, which we can use it with either ONNX runtime or Transformers.js [11].

3.3 New Plugin API - Generic Web Worker

As explained before, this new feature in the Plugin API allows users to create new web workers that would run in the core application instead of the plugin. For more details, you can go here

4. Reflection

I would like to first thank Joplin's mentors and community who embarked on this journey with me. It has been a truly amazing experience for me!! I am very grateful for that, and I hope people (including you who is reading this!) are enjoying the new plugin!!

It was really fun to dive into the world of NLP!! However, I really struggled with the AI ecosystem's limitations in Javascript. Still, it forced me to implement the algorithms from scratch, which was really cool! It was also cool to imagine and apply some linear algebra concepts! Usually, I would otherwise use sci-kit-learn or other machine-learning packages if they existed in Javascript. However, having those packages would be very beneficial if we want to have the most optimal unsupervised algorithms. For example, I wanted to do co-reference resolution, which means we match nouns with pronouns in the following sentences (Spiderman is cool. He can fly! -> Spiderman is cool. Spiderman can fly!). That would strengthen the connection between sentences. One way is to do Hobb's algorithm. Still, the disadvantage is that it is heuristics-based, which lacks understanding of semantic connections between sentences (also, the distance between sentences plays is a problem). Using neural networks would be much better as they cover more cases and reduce false positives (i.e., spaCy coref!

However, the future in the Javascript/Typescript ecosystem looks bright. Apart from ONNX runtime and Transformers.js, there is, for example, Pyodide, which allows us to run packages in the browsers. We could utilize sklearn and other scientific libraries, which would be greatly useful and focused only on implementing theoretical solutions.

The main learnings and takeaways from the project:

WebAssembly (WASM) allows AI models and algorithms to be run on the web. It is important to understand and use it. "It provides a way to run code written in multiple languages on the web at near-native speed, with client apps running on the web that previously couldn't have done so." [12]
Before starting to implement, it is useful to create a tech spec to provide a comprehensive overview of the system and enable other engineers to give more meaningful input.
While there are open-source AI libraries and packages in Javascript/Typescript, they often don't match the exact implementation of the algorithms in original research papers. This gap underscores the need for more AI engineers in the Javascript/Typescript ecosystem. However, implementing algorithms from scratch gives you more control and ways of changing and improving the model.
The community is everything in open-source; you will meet different people from all walks of life and (technical and human) experiences. They are essential for making the application/features to be the best version as much as possible.
I understand now why some people advocate for "Release early, release often. "Based on all the feedback, the application/feature improves with each release.

Technologies: Javascript/Typescript, Webpack, natural, onnxruntime-web, transformers.js, mathjs, .wasm, React, styled components, ChakraUI, TipTap, jest

Last Remarks

Anyway, this is not the end!! I will still be here after the program since the plugin could be improved. I think it is nice that, in the end, we allow users to edit summaries and add their notes in the Tiptap editor to their actual notes! Furthermore, I definitely want to implement a new plugin API with web workers that would be easy for future contributors to create workers!

I’m happy to be the GSoC contributor for 2024 for Joplin!