Skip to content

Instantly share code, notes, and snippets.

@Kishan-Ved
Last active February 26, 2025 09:41
Show Gist options
  • Save Kishan-Ved/ebe0a971220d67517ae815e4f92d2459 to your computer and use it in GitHub Desktop.
Save Kishan-Ved/ebe0a971220d67517ae815e4f92d2459 to your computer and use it in GitHub Desktop.
Google Summer of Code 2024 Report - Kishan Ved

Google Summer of Code 2024 Report

This report summarizes the work done by me (Kishan Ved) for Google Summer of Code 2024 with the NumFOCUS organization, on the project Open Science Labs: PyDataStructs: Add a C++ Backend for tree data structures and their algorithms. Weekly reports are available here: GSoC BLogs

image

About me

I am Kishan Ved, an undergraduate student at the Indian Institute of Technology Gandhinagar (IIT Gandhinagar), India, in the department of Computer Science and Engineering.

About PyDataStructs

PyDataStructs aims to be a Python package for various data structures in computer science. It is under development involving addition of algorithms and their parallel implementations. To the best of our knowledge, PyDataStructs is the first well-designed library/package which has covered most of the data structures and algorithms.

Features:

  • A single package for all your data structures and algorithms - We have and are implementing many popular and useful data structures and algorithms.

  • Consistent and Clean Interface - The APIs we have provided are consistent with each other, clean and easy to use. We make sure of that before adding any new data structure or algorithm.

  • Well Tested - We thoroughly test our code before making any new addition to PyDataStructs. 99 percent lines of our code have already been tested by us.

So, you can easily rely on PyDataStructs for any data structure or algorithm you want to use without worrying about implementing it from scratch. Everything is just a few calls away.

GSoC Project Goals

Project link: Open Science Labs: PyDataStructs: Add a C++ Backend for tree data structures and their algorithms

My project involved adding a C++ backend for all tree data structures in PyDataStructs, a Python package for advanced data structures and algorithms. The user has an option to select either the Python backend or the C++ backend.

tree = RedBlackTree(backend=Backend.CPP)

For any data structure, the Python backend is developed first, and once completely tested and ready, its C++ backend is developed. Both the backends share full functionality and are completely compatible. The C++ backend is extremely fast, it executes codes 8-10 times faster. This enhances the computation speed, making it extremely valuable for scientific computing and high-performance applications.

Project Outline

I started working from the community bonding period itself, and this gave me a good headstart and allowed me to complete the project in 12 weeks. Here's an outline of all the work I did:

Pre GSoC Work

PR Description Contribution
PR: Added Introsort algorithm ${\color{green}+127}$
PR: Fixed version related bugs ${\color{green}+4}$

Community bonding period

PR Description Contribution
PR: C++ backend for Node, TreeNode, ArrayForTrees, BinaryTree and BinarySearchTree and all tree traversals implemented ${\color{green}+1,936}$

Coding Phase 1

PR Description Contribution
PR: C++ backend for Self Balancing Binary Tree ${\color{green}+328}$
PR: C++ backend for Red Black Trees ${\color{green}+748}$
PR: C++ backend for Binary Indexed Trees ${\color{green}+179}$
PR: C++ backend for Splay Trees ${\color{green}+423}$

Coding Phase 2

PR Description Contribution
PR: C++ backend for AVL Trees ${\color{green}+488}$
PR: C++ backend for Cartesian Trees ${\color{green}+254}$
PR: C++ backend for Treap ${\color{green}+150}$
PR: C++ backend for all trees in binary_trees.py file complete ${\color{green}+72}$
PR: Updated Documentation ${\color{green}+12}$

Contribution Stats:

Lines added: ${\color{green}+4,721}$ (#2 contributor in terms of lines added)

Commits made: 12

Total merged Pull Requests : 12

Here's a complete list of all my merged PRs

Speed results

Click to see a benchmark code

To run this code, you need PyDataStructs installed with the following imports:

import timeit, functools, os, pytest
from pydatastructs.trees.binary_trees import (BinarySearchTree, RedBlackTree)
from pydatastructs.utils.misc_util import Backend
def test_BinarySearchTree(**kwargs):
  cpp = Backend.CPP
  repeat = 1
  number = 1

  size = int(os.environ.get("PYDATASTRUCTS_BENCHMARK_SIZE", "1000"))
  size = kwargs.get("size", size)

  BST = BinarySearchTree
  b1 = BST(backend=Backend.PYTHON)
  b2 = BST(backend=Backend.CPP)

  def f(backend, tree):
      for node in range(-1000,1000):
          tree.insert(node, node)
  def g(backend, tree):
      for node in range(-1000, 1000):
          tree.search(node)
  def h(backend, tree):
      for node in range(2000):
          tree.delete(node)

  kwds_dict_PY = {"backend": Backend.PYTHON, "tree":b1}
  kwds_dict_CPP = {"backend": Backend.CPP, "tree":b2}

  timer_python = timeit.Timer(functools.partial(f, **kwds_dict_PY))
  python_insert = min(timer_python.repeat(repeat, number))

  timer_cpp = timeit.Timer(functools.partial(f, **kwds_dict_CPP))
  cpp_insert = min(timer_cpp.repeat(repeat, number))
  assert cpp_insert < python_insert

  timer_python = timeit.Timer(functools.partial(g, **kwds_dict_PY))
  python_search = min(timer_python.repeat(repeat, number))

  timer_cpp = timeit.Timer(functools.partial(g, **kwds_dict_CPP))
  cpp_search = min(timer_cpp.repeat(repeat, number))
  assert cpp_search < python_search

  timer_python = timeit.Timer(functools.partial(h, **kwds_dict_PY))
  python_delete = min(timer_python.repeat(repeat, number))

  timer_cpp = timeit.Timer(functools.partial(h, **kwds_dict_CPP))
  cpp_delete = min(timer_cpp.repeat(repeat, number))
  assert cpp_delete < python_delete

  print("Python Time:")
  print("insert(): ",python_insert,"s")
  print("search(): ",python_search,"s")
  print("delete(): ",python_delete,"s")
  python_total = python_insert+python_search+python_delete
  print("Total Python time: ", python_total,"s\n")

  print("C++ Time:")
  print("insert(): ",cpp_insert,"s")
  print("search(): ",cpp_search,"s")
  print("delete(): ",cpp_delete,"s")
  cpp_total = cpp_insert+cpp_search+cpp_delete
  print("Total C++ time: ", cpp_total,"s\n")

  print("C++ backend is",round(python_total/cpp_total),"times faster!")
  
test_BinarySearchTree()

Time taken for methods of Binary Search Tree class to execute in different backends:

Centered Image

The picture clearly indicates the utility of the C++ backend. It makes code execution much faster. This is helpful for high-performance computing.

Weekly reports

My Google Summer of Code blogs are available on my website: https://kishan-ved.github.io/portfolio/blog/

My GSoC Journey

Read about my open source experience, right from my very first open source contribution to getting selected for GSoC and completing my Google Summer of Code project!

My Google Summer of Code Journey

Future work

My project is complete, the C++ backend for all trees is fully functional. Some (non-critical) issues have been opened, these need to be addressed. For upcoming plans (and major goals), refer PyDataStructs Wiki on GitHub.

Learnings

GSoC has significantly accelerated my learning curve and taught me to write clean, robust and production ready code for solving complex, real-world problems. Here are some of my key learnings from GSoC:

Tech: Mastered the art of linking a Python code to a C++ backend by using the Python-C API to improve speeds greatly. Polished my C++ and Python coding skills.

GitHub: Learned various new commands, resolution of conflicts and merging branches for collaborative work.

Perseverance: GSoC taught me to read the documentation, be calm and perseverant. It's difficult at the start but smoother ahead!


Thanks to my mentor Gagandeep Singh for his support and guidance. Thanks to Ivan Ogasawara and the team at Open Science Labs and NumFOCUS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment