Currently the RubyGems index is stored as a Gzip file that is a marshalled array. Whenever Bundler needs to install a gem that is not yet installed it downloads the index, gunzips it and unmarshals it. It then looks for dependencies that are described in another file that is also a gzipped and marshalled file. There are several issues that arise from this setup:
- The full index must be downloaded and parsed, but does not contain dependency data, which must then be downloaded and parsed. This is a relatively time consuming process.
- The index must be centralized.
Additionally the gems themselves are currently centralized since there is nothing in the meta data that indicates where the gem should be downloaded from. However in order to allow this it is important to find ways of keeping the index from being poisoned (is this an issue in the centralized system?)
I'd like to propose an alternate way of indexing RubyGems: let's use DNS.
Here's how this might work. For this example, I want to get the latest version of Rails, which is 3.0.1 (in this scenario):
- Client sends question to local name server for ALL records at rails.index.rubygems.org
- Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
- Root delegates to .org name servers
- .org name servers delegate to rubygems.org name servers
- rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.
When a query is received for a specific gem then a collection of PTR records will be returned that represents all available versions for that gem:
rails.index.rubygems.org. 84600 PTR 1.0.3.rails.index.rubygems.org.
rails.index.rubygems.org. 84600 PTR 2.0.3.rails.index.rubygems.org.
rails.index.rubygems.org. 84600 PTR 3.0.3.rails.index.rubygems.org.
If a specific version is requested then PTR records will be returned that represent all of the dependencies for that version. For example:
1.0.3.rails.index.rubygems.org. 84600 PTR 0.0.3.activesupport.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 84600 PTR 0.0.3.actiopack.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 84600 PTR 0.0.3.activerecord.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 84600 PTR 0.0.3.activeresource.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 84600 PTR 0.0.3.actionmailer.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 84600 PTR 0.0.3.railties.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 84600 PTR 1.bundler.index.rubygems.org.
Note that some PTR records represent canonical gem names and others would be a CNAME pointing to the appropriate canonical version. The last record is an example of this where the CNAME record would likely resolve to something like 7.0.1.bundler.index.rubygems.org (which would be the reverse notation for bundler-1.0.7). This also allows for ~>, = and >= support and, with some small CNAME manipulations, <, <= and != as well. More information on this below.
If the latest version of a gem is requested:
latest.rails.index.rubygems.org. 600 CNAME 10.0.3.rails.index.rubygems.org.
For instance in the Amalgalite 1.0.0 gem has runtime dependencies of
- arrayfields ~> 4.7.4
- fastercsv ~> 1.5.4
This can be modeled with the following set of records
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR 5.1.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR 7.4.arrayfields.index.rubygems.org
5.1.fastercsv.index.rubygems.org 600 CNAME 4.5.1.fastercsv.index.rubygems.org
4.7.arrayfields.index.rubygems.org 600 CNAME 4.7.4.arrayfields.index.rubygems.org
It is not exactly the same, but close enough, the 5.1.fastercsv.index.rubygems.org would then be a CNAME record for the latest 1.5.x version of fastercsv.
for a = dependency, they would be:
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR 4.5.1.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR 4.7.4.arrayfields.index.rubygems.org
And for a >=, they would be dependent on the most recent release of the gem in question, which is always found as the CNAME of that gemname
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR latest.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR latest.arrayfields.index.rubygems.org
For a <, <=, != or dependencies with more than 1 requirement we need to do some trickery. The PTR record for the dependency will point to a CNAME record prefixed with the name dependency. This CNAME record will then be updated as new versions of the given gem are released as long as the dependency can still be satisfied.
Let's look at some examples:
"< 4.5.1" for fastercsv and "<= 4.7.4" for arrayfields:
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
fastercsv.0.0.1.amalgalite.index.rubygems.org 600 CNAME 0.5.4.fastercsv.index.rubygems.org
arrayfields.0.0.1.amalgalite.index.rubygems.org 600 CNAME 4.7.4.arrayfields.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR fastercsv.0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR arrayfields.0.0.1.amalgalite.index.rubygems.org
The != dependency op is essentially the same. Consider "!= 4.5.2" for fastercsv:
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
fastercsv.0.0.1.amalgalite.index.rubygems.org 600 CNAME 1.5.4.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR fastercsv.0.0.1.amalgalite.index.rubygems.org
If a patch release for fastercsv was released with the version 4.5.2 then the CNAME record for ne.2.5.4.fastercsv.index.rubygems.org would not change. On the other hand a patch version of 4.5.3 would cause the CNAME to change:
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
fastercsv.0.0.1.amalgalite.index.rubygems.org 600 CNAME 3.5.4.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR fastercsv.0.0.1.amalgalite.index.rubygems.org
For example, if the dependency is on fastercsv [">= 1.0.4", "< 1.7.0"], and the current version of fastercsv is 1.1.0 then the records would look like this:
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
fastercsv.0.0.1.amalgalite.index.rubygems.org 600 CNAME 0.1.1.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR fastercsv.0.0.1.amalgalite.index.rubygems.org
If the version of fastercsv was changed to 1.6.9 then the records would be:
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
fastercsv.0.0.1.amalgalite.index.rubygems.org 600 CNAME 9.6.1.fastercsv.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 84600 PTR fastercsv.0.0.1.amalgalite.index.rubygems.org
And if the version was changed to 1.7.0 or higher, the CNAME would not change.
All of the above dependencies are assumed to be runtime. If using the gem command you typed:
gem install --development amalgalite
Then that would install all of amalgalite's development dependencies. To facilitate this same functionality we will add an additional PTR records for all the development dependencies using 'gemname-development' as the namespace.
latest.amalgalite.index.rubygems.org 600 CNAME 0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite-development.index.rubygems.org 84600 PTR 8.0.rake.index.rubygems.org
0.0.1.amalgalite-development.index.rubygems.org 84600 PTR 2.1.configuration.index.rubygems.org
0.0.1.amalgalite-development.index.rubygems.org 84600 PTR 5.2.rspec.index.rubygems.org
8.0.rake.index.rubygems.org 600 CNAME 0.8.0.rake.index.rubygems.org
2.1.configuration.index.rubygems.org 600 CNAME 0.2.1.configuration.index.rubygems.org
etc ...
In addition to dependency management another interesting use of DNS is to provide references to where gems can be downloaded. Here is how this might work:
- Client sends question to local name server for ALL records at rails.index.rubygems.org
- Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
- Root delegates to .org name servers
- .org name servers delegate to rubygems.org name servers
- rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.
- when queries for latest.rails.index.rubygems.org the rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all NAPTR records for 1.0.3.rails.index.rubygems.org,
for example:
latest.rails.index.rubygems.org. 600 CNAME 1.0.3.rails.index.rubygems.org.
1.0.3.rails.index.rubygems.org. 60 NAPTR 100 10 "U" "TCP+http" "!^.*$!http://rubygems.org/rails-3.0.1.gem!i" .
1.0.3.rails.index.rubygems.org. 60 NAPTR 100 20 "U" "TCP+http" "!^.*$!http://backup.rubygems.org/rails-3.0.1.gem!i" .
Note that there is no need to do any complex regex translation to get the various URLs since they are mapped directly to the canonical name of the gem.
To support multiple platforms (i.e. jruby) the client will first try platform.z.y.x.gemname.index.rubygems.org. If this is not found then the client should use z.y.x.gemname.index.rubygems.org. If a platform gem is provided then CNAME records will also need to be provided for all of the variations, i.e platform.y.x, platform.x and platform.
DNS provides the tools necessary to make this a decentralized system if we desire. This would be accomplished by delegating responsibility for gem names out to different DNS servers other than the rubygems.org servers. For example, if responsibility for management of the Rails gem metadata was decrentralized then the interaction might look like this:
-
Client sends question to local name server for TXT records at rails.index.rubygems.org
-
Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
-
Root delegates to .org name servers
-
.org name servers delegate to rubygems.org name servers
-
rubygems.org name servers respond with the following NS record:
rails.index.rubygems.org. 600 NS ds1.rubyonrails.org rails.index.rubygems.org. 600 NS ds2.rubyonrails.org
-
The question is then sent to one of the two name servers which responds with a CNAME record pointing rails.index.rubygems.org to 1.0.3.rails.index.rubyonrails.org.
-
The rubyonrails.org name servers would then respond as shown in the scenarios above.
DNSSEC providers a means for signing DNS records so that you have verification that the name server is authoritative for the particular question. This technology is not yet widely deployed, however it does have the potential for providing layer of protection against gem poisoning when used in conjunction with and SHA signature. The SHA signature could also be stored in the name servers using a TXT or SIG record. This technology is still very experimental, but the potential exists for having a highly trusted distribution system.
DNS does not provide a mechanism for search for records given a part of a name. For example, there is no mechanism in DNS to query for the term "active" and get "activerecord", "activeresource", etc. This functionality would need to be provided using a protocol other than DNS.