These notes are inspired by the first Tech Salon meeting which began the conversation about software tools and toolkits. Software tools and toolkits are a very large subject area with lots of fuzzy distinctions. While this document is a work in progress it may serve as an introduction to the topic, especially for someone joining our software tools enthusiasts group.
A human being without tools has little power in the world. A tool enables a person skilled in its use to be highly productive with specific tasks. We have developed our skills and our tools together; they are interdependent. Our tools are generally organized in toolkits oriented towards specific application areas, such as:
Returning to the toolkits you identified above, please answer these questions:
Software Tools are collections of deployable modules of code which make us productive in solving problems whose nature is information, including
Toolkits are typically organized by application areas. Software tools are alternatively or additionally organized by the data formats which they understand. Tools can often be creatively employed across application area as long as they can process the data. There can be a lot of power in keeping data formats very open and general so that more tools are potentially applicable. There is a creative tension between the efficiency of custom data formats and flexibility of more general data formats.
Note that simple data formats basically leave all the work of understanding the meaning of the data to a human observer. For example, a table of numbers has no objective meaning. At a higher level, an XML Document or a Relational Table when accompanied by their Schemas become partially self-descriptive. Programs can automatically treat the data appropriately, which raises it to the level of information. At the highest level, Knowledge Representation formats allow the meaning of information to be put intelligently to use by general-purpose programs. It is very useful to have software toolkits spanning all of these levels.
The Unix Operating System promoted the concept of general-purpose tools which could process nearly any kind of data as long as that data was expressed as lines of human-readable text. The GNU Project improved and extended the original Unix text toolkit to create the tools which are now standard on the Gnu/Linux platform and very popular among sophisticated users and professionals on all modern platforms.
Much of the power of using lines of text to represent data comes from the ease with which humans can directly understand the data. The developers at Bell Labs who developed the first version of these tools also incorporated a sophisticated pattern matching system called Regular Expressions which made it possible to deal with complex data formats within, and sometimes across, lines of text.
Lines of text describable by Regular Expressions starts to be a poor format when the data
Some important tools which empower users working with data organized as lines of text include
Lines of Text and Regular Expressions cannot easily represent data which is hierarchical in nature, i.e. data involving nested patterns. These deficiencies lead many developers towards an emerging toolkit using XML-based hierarchical formats. XML formats (or XML languages) such as XHTML for web pages and ODF for office documents are still human readable text but the format of the data is given by explicit tags rather than by lines and delimiters. In fact in XML-based formats lines (and indentation) are no longer important except to make the data flow nicely when the data is being looked at by humans.
Some important tools which empower users working with XML-based text include
Representing and processing data as lines of text or as XML documents becomes a problem as the size of the dataset grows and is also a problem when a user requires complex correlations across the data. These scaling issues lead to a desire for more compact binary formats accompanied by efficient index structures. Binary formats can easily be more than ten times more compact than text and more than 50-times faster to process (because of not having to parse the data). Suitable choices of indexes can often reduce the time to perform a complex operation from enormously long (hours, days, years?) to a few seconds or less.
Relational databases are the most popular and are very general. Many people believe that relational databases cannot efficiently express hierarchical or network organized data but this is not true. While the relational model underlying relational databases is very general and powerful, it is often necessary to write and maintain complex schemas to get those advantages. Many relational databases have extensions which especially cater to XML-formatted text data. Additional extensions and metaprogramming techniques can further extend the advantages of the relational model.
Entities (Data Objects) in Procedural Languages such asFunctional Languages such as Virtual Memory Space of the Process running a Computer Program. Normally when the Process running a Computer Program stops, the Virtual Memory Space holding the data is discarded and any objects which have not been saved in some manner outside of the Process are lost. It can also be difficult for objects running in one Process to communicate and coordinate with objects running in other Processes.
Here are some strategies for Objects to save and restore themselves between executions of their Programs by a Process:
Within the Virtual Memory Space of an executing program, software tools are primarily organized as procedures called functions or methods. Procedures are the most efficient and flexible kind of software tool. Procedures have a very fine granularity, which means that procedural toolkits (often grouped into Interfaces, Service Packages and APIs) can become exceedingly complex. Refactoring tools can be very helpful for managing the complexity of procedural toolkits.
To coordinate their activity across multiple processes, which may span multiple computers across computer networks, objects employ many strategies, such as
When Steve_Russell implemented the first version of the Lisp Programming Language (in 1958!), he discovered that he could use the same representation for the Lisp Programs as those Programs used to represent Lisp Data, a very general representation called Symbolic Expressions or S-Expressions for short. This made Lisp the first Homoiconic Language. Lisp programs consist of procedures called functions. Since Lisp functions are written as S-Expressions and Lisp functions can easily generate and manipulate S-Expressions, it is especially easy to write Lisp functions to write other Lisp functions or even whole Lisp programs. This metaprogramming capability makes Lisp a particularly powerful environment for building software tools. The original Lisp has spawned many dialects over the years and Lisp-family languages remain favorites of programmers dealing with highly challenging application areas.
In some ways software tools based on Lisp face the same challenges as the other Object-Oriented systems mentioned above, but in some ways Lisp's structure makes things easier.
Representation is the more general encoding of human-meaningful information
into forms which allow meaningful computation by general-purpose computer programs. Many powerful and
representation systems have been invented and implemented along with
accompanying software tools. Ultimately this area has the greatest potential
for applying computation to serve human needs, yet most of today's computer programmers
are mostly or entirely unaware of it! Users often say
"I just want to tell
the computer what I want and have it do it." The key to having a computer
program understand what you want is a user interface which translates your wishes into a suitable
Knowledge Representation format.
The general Wikipedia article on Knowledge Representation is a great place to start, but I'm going to list here a few of the key Knowledge Representation systems which are available and potentially useful.
Use the handy Pu'uhonua Contact Page