We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Analysis Summary
Worth Noting
Positive elements
- This video provides a clear historical and practical explanation of how Clojure's reader literals solve real-world data serialization problems.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Datomic Cloud - Datoms
ClojureTV
The Taming of the Deftype Baishampayan Ghose
Zhang Jian
Exercism Summer of Sexp - solving challenges with Clojure
Practicalli
Composable Tools - Alex Miller
ClojureTV
Understanding Core Clojure Functions Jonathan Graham
Zhang Jian
Transcript
my name is Steve miner and I'm very happy to be here at closure West today's talk is gonna be about the data readers guide to the galaxy so the inspiration for the stock comes from Douglas Adams who's a British humorist and he's best known for his checkers Guide to the galaxy so that was where I'm stealing some of the better lines for this top Hitchhiker's Guide to the galaxy I knew is a book but originally it started as a radio show and then they made it into the TV series the book came out and even after Douglas Adams died they made a movie based on his guide so the idea that he took the same material and reused it in different ways even thinking about scripts where the roles had labels and maybe had different actors playing the parts that corresponds to how we're using data readers to deal with tag literals and using the same concepts but maybe with different concrete implementations so Douglas Adams has shown this really excellent reuse of his source material as software people we'd like to do the same kind of reuse so the guide aims to be the standard repository for all knowledge and wisdom so in this half-hour talk we're going to try to get most that covered we'll start with the basics about tag literals and data readers talk about a few things that are new in closure 1.5 cover eating which is kind of like JSON but for closure and then at the end we'll talk about a few unorthodox ideas some things you can do with data readers that may be rich didn't intend you to do okay starting with tag literals so the tank the format per tag literal is the hash character followed by a symbol fully qualified and then some other literal data that closure already knows how to read so that might be a string numbers vectors anything that we already know how to cover it also would work with another tag literal so so you can get more complicated if you want to but I think for most people we'll just start with plain literal literal data the tag is telling you some kind of information some way we want to interpret this literal data in a specialized way but it's a weak form of contract it's not as strong as saying this is a particular type or class or even if saying well it has to fulfill some interface it's just saying there's some concept here that we want to implement in your code so for example there's the in-store instant kind of data literal and that transforms a string into an actual instant in time we'll talk more about that that's built into closure you can say that these tag literals are self describing data and self described might be a little too strong term but the idea is that you're giving it some kind of label so people can know what this data is rather than just saying well by convention I'll give you a long and you should just treat that as a certain number of milliseconds since a reference time you can actually implement maybe it's a Java class maybe it's a record or maybe it's some other just closure data structure but you can choose particular concrete implementation for that tag literal now we also say that it's loosely coupled to the implementation so you may have one choice in your application for these tag literals one interpretation of this data someone else might choose to do it a little different way and in particular there might be other programming languages other sources of data other users of your data that are using the same print representation but with totally different concrete realizations in your code and so I think the whole motivation for this was to handle data transfer where closures not necessarily in of everything and you can think of you know even the clean closure and closure script there are some differences in the concrete implementations so this is a way to kind of cover over that give you a way to express ideas without demanding so much from the other side and it gives you an open concept here and it leads to what we call the extensible reader instead of having the reader know only you know the predefined Java types now you can start adding your own types that the reader will just understand or at least you have a chance to teach the reader how to understand your notation for these tag literals and you can customize it to your application other applications might use a different implementation there so in a sense this is a limited form of common lisp reader macros I think rich has done a good job of not just saying well closure is kind of common list but without a bunch of things I didn't like but he said you know ideas have to be useful and he's collected a lot of good ideas not just from lists but other programming languages and composed them so that they're a coherent whole and I think he was trying to be careful about not putting you know reader macros right into the language immediately from the start because we're sure we wanted all that so this is a way of getting some of the functionality that common list has in reader macros but you know maybe more control form you know when you're thinking about Dana readers I want to remind you that this all happens at read time so this is you know when we're taking the textual representation of your program and converting it into our closure data structures then those that result from your data reader is what gets handed to the compiling so you're you're working at the kind of top level of just what the reader is is considering and the compiler doesn't really know until you're done returning your result from your data reader so the way you control what your data readers are going to do how their didn't interpret these tag literals is by this dynamic var called data readers you know with earmuffs on it and that's initialized by a resource in your project if you have a file called data underscore readers clj just at the top level of any of any of your jars or top level of your project that should be a literal map and that identifies for a particular tag symbol what data reader function we want to call so it's a map of bar stack symbols to bar symbols and of course at runtime you can bind data readers and control the interpretation more more specifically for a section of code so let's take a look this is a kind of a contrived example but it's it's simplified so we can fit it on the screen here this function my reader takes an argument just you get one argument but you can make it a vector you can make it a map something else that that will combine multiple values in this case it's a vector of two elements and then we're just going to multiply the first element by ten and add it to the second element so we're expecting numbers here and you can see we do a binding for data readers the tag is a fully-qualified tag all the in your user codes you should have a name space on your tags so you don't conflict with other other libraries or other users closure has some built-in tags we'll see in a second that they don't use a namespace form so we're declaring that my reader is going to be the data reader how we interpret any tag that looks like my dot NS / tag and then I'm calling read string and giving it a string with our tag and the vector you know the vector notation there are two numbers so then the result there at the end is 42 so you can see how we did are called our reader function return 242 then that gets handed you know back to closure compiles it's already a literal so you know there's not much evaluation ass happen there but you can't do more complicated things okay if there's no data reader defined for a particular tag closure we'll take the next step is to look at the predefined default data readers so you can't control these data readers they're just built into closure in particular the UUID which will give you a unique key or meek value if you need that for a key or some other well identifying data in your application and the nice thing about UUID is that you don't have to have any central authority helping to coordinate these unique values it's an algorithm that means you're very unlikely even two separate machines ever to ever to have a conflict there where they might generate the same thing and inst is closures abbreviation for instance a represents an instant in time and as the guide says time is an illusion lunch time ee so i feel the same way about daylight saving time but that's a little bit of a side so insta the definition for inst from the RFC 3339 it's just some defines a way of printing string the string representation for an instant in time and we're trying to we're trying to say it's an instant like a UTC or the same instant all around the world okay so you can abbreviate there's a few examples here on the slide you have to give the the the most significant part of the date time first but you can leave off the other the other pieces and we default you know to appropriate kind of 0 values so you can just have a simple date it goes all the way to an offset we don't use time zones in times those are so complicated and political we just use the numeric offset so it's it's our minute offset and you can think of those as resolving back to the Zulu time or the the grinch mean time for UTC when you're comparing things but we'll talk a little bit more about that in a second and you can get all the way down to nanosecond resolution if that's what you want but by default closures then they use Java util date as the concrete implementation for an in-state unfortunately is is maybe one of the worst object-oriented you know class has ever defined I think Suns kind of embarrassed about that and most of us deprecated that it's still the best kind of general class we have for handling date and time war it's commonly used I think if if Java had done a better job a better date maybe none of this would have been necessary and date would have just been built into closure but because I you know this is my interpretation but it's not so good a lot of people don't like it ja the Java people came out with a better class to try to improve the API a bit with calendar and they made calendar with the richer API got rid of some embarrassing kind of constructors they had with the original date and calendar is also sensitive to the offset which is a little bit mean for some people but it preserves the offset so if you say my my time is situated in a certain offset you know implying a certain time zone then it will remember that whereas a Java util date was always converting back to UC UTC Java sequel timestamp added nanosecond resolution a really strange implementation on timestamp I think that's another you know big mistake that the job of people made early on but the point is a lot of databases want nanosecond resolution so they had to do something different and they kind of patched it on to the existing date and they even have some like embarrassing wording where they they they tell you well don't use parts of date even though we're inheriting we don't really mean that it was kind of a convenience for implementation and so time stance so all these Java dates that that exist now are kind of embarrassing and not not well designed so for most of the Java people are using something called joda-time and in closure we have a clj time wrapper over that and I would say if you're doing serious work with dates that's probably a good library to use if you have to make calculations with dates you should look into that then finally there's jsr 3:10 which is Javas I don't know maybe it's their third or fourth try to get their dates right the people who worked on joda-time are creating this new standard that said it'd be similar to joda-time but have nanosecond resolution and a few other changes but they you know lessons they've learned from Joe to time and this is scheduled to go into Java 8 so someday there'll be a new Java time there and they say they'll back port it to Java 7 so you know I don't know how soon closure will ever get to requiring that version you know Java 7 or Java 8 but you will be able to use jsr 3:10 someday so I'd say it's about time Jonathan got time time right we mentioned UUID that's just away again unique values and you can call out to the Java to get that and then a couple of other examples these are just things that people might do with tag literals so again it's simple to take there's a hash sign then a tag name a symbol and then some kind of data it might be a string the last one here for coordinates you could imagine doing latitude and longitude as two floating point numbers now this is a if if your tag is not known it was not declared in your data readers it's not built into closure then normally you'll get an error but there are cases where you want to just handle things that maybe you've never you you haven't anticipated and in that case closure and one five is added default data reader function so this is a dynamic bar that you can bind to your own function and you'll get a chance to decide what to do with that unknown tag so it defaults to nil just like closure 1-4 and that will give you the air as usual but if you bind it to a function we'll call the function with the tag and the value okay so it's a little different the regular data readers just get the value you you presumably already know the tag but for the default data reader we'll give you both and then you can return the default value a literal value that passed on to closure and so here's an example something talked about for a default data reader suppose you had a record called tag value and it had a fields for tag and value you might define a print method on that so that it prints nicely and you would just see you know the hash tag and then the value after it your default data reader function can use the records factory method so whenever you define a factory closure we'll define this function with the arrow and then the records name and it takes the field so we've ordered the fields in the same order that the default data reader function wants see it so you can pretty simply get that function to work and handle you know unknown tags as a record and input if you look here at how records and tags print they're pretty similar right so if you're printing a record you'll see that you have the namespace and it ends with the record name like so it looks like a Java class then followed by a map now one thing notice there's no space between the record part of the name and then the map notation with all the values that are the field values inside your record you can imagine doing a tagged representation for that that's very similar but you know the last period now we need to slash because we want a fully-qualified symbol name there and we need a space because for a tank literal we do the tag then white space and then the value that we want to interpret and this this case will interpret maybe the same map so your default data reader to handle any kind of record if you use a tagged notation we can look up from your tag we could look up the factory method and then call that and I'm saying here that maybe you don't want to do this in general because you don't want to you know accidentally you know accept all the typos and things you might restrict your default data reader just to handle certain namespace that you think you know you might have a you might be using your application and you might restrict it to handle you know just a capitalized tag so this is a I think we can read this the first part is just defining a method to get your factory method by taking apart the tag and we just take the name and put map arrow in front of it then our default reader in this case is being careful it's only done oh it's looking say well if I have a map value you the first character of the the tags name is capitalized and I can find a record factory for that then I'm going to call that factory with the value if I don't see that then maybe I'll just treat it as a an unknown generic tag value so this is a way if you're if you're just processing data and you're seeing things that you don't understand but you want to pass it on to some other process this will allow you to tolerate those unknown values if you're a library author and you're creating your own tag that you know these are a few ideas that you should consider the first thing is you have to document what what the semantics are for your tag what kind of base literal value you expect people to give you there using your tag you know maybe it's a string that's kind of the most generic that you know might be something else like a vector or any of really any of the closure data literals that are already predefined you should think about what your implementation types are typically I think in closure you might want to use a record or maybe you're connecting it to a java class that already exists I think in your library you should provide you need to provide some data reader functions so that the users of your library can decide often it's just one data reader but as you saw with the inste instant closure provides three different kinds of data readers to handle you can the basic date or the calendar or the timestamp we'll talk a little bit more about the print functions in a minute but if your if this data literal that you're defining your tag is going to be really owned by your library you might want to find a print method so that you know your particular implementation always prints that way if it's something that is may be used by other closure applications other Java libraries maybe you don't want to define printing because that's kind of like taking ownership of that particular concrete type right the print methis go on the concrete type we'll see that in a second and in some cases you might want to find in your library a data readers ACLJ if you really own that type and nobody you don't want anyone else overriding it or putting you know putting their own data reader on your type you could define it but in general I'd say don't include a data reader zlj in your library because that's kind of your your library then would be taking you know control of that tag so strongly that your users couldn't override it and do their own thing you get a conflict if there are two different data reader to different data reader resources that define the tag a different way okay so for printing if you're interested in printing you can take a look at in the closure source for instance clj and you'll see the definition for how to print the three different kinds of concrete dates so java.util day zombie until calendar and Java sequel time stamp and one interesting thing and I'm not sure how I feel about this completely but print dupe is basically ignored the idea behind print dupe is that if you're if you need to have a you know very specific you want to recreate the exact class that was being used there's a closure notation for for capturing that class name so when it recreates the literal value it uses that particular concrete class so I give your printing integers you want to say I got to preserve this as an integer I don't want to go to default long that closure uses there there's a way to do that but most of the time you don't care about that in our date printing code we're just saying okay whenever we're never done a want to preserve the particular concrete class we're just going to always convert to an in stand let our users do what they want to do with that yeah there's a few gotchas if you're defining your own data readers the first thing is if your data reader returns nil you're going to throw an error okay so normally you wouldn't want to return nil anyway but when I was doing some experiments I said well what would happen and I got an unhelpful err that's the worst part so if you if for some reason you did want to return nil then just return this an expression like quote nil or quote on the on the list quote nil because that that will be handed back to the compiler and that will end up returning a nil in the as the final evaluation but you'll avoid this unfortunate error and I think that'll probably get fixed fairly soon we had a patch for that but it didn't come in time for closure one file the other gotcha if when you're picking your tag names you want to be fully qualified but in the name part of the tag so after the slash don't use a period just for now because that will lead to an error it's that's just kind of an accident inside the closure source that that's not supported right now but I think that's intended to be supported down the road and just to remind you all the Java date classes the concrete types for inst are immutable you know you should you shouldn't ever mutate you know value you've handed to closure but if you're getting it from some kind of user interface element and it's wired up in a bad way you might end up changing the underlying value that would be that would cause confusion you're in your closure code so just make sure your dates don't get mutated okay now in this example I wanted to show you just how inst the long form so those first two expressions are really the same instant in time right because we take the offset subtract the offset so that's a negative seven it's like adding something back and for the hours and so those will be the same thing that's true but we're using a calendar as our concrete type calendar or members offset so they don't compare equal unless the offsets are exactly equal so just do you be careful about that if you're using calendar okay the next next topic is about read eval so I know maybe half of the people in here earlier we're we're kind of new to closure so read eval you may not have heard much about but let me go over it quickly and then you can maybe understand what's going on so I call this the read eval kerfuffle because we had in the community on the mailing list we had some issues about what was happening with read eval this all started because Ruby on Rails had some vulnerabilities and in fact I think several rail sites for compromise led to a lot of problems in the rails community they had to really work hard to get some things patch and that was all due to maybe a Hamel parser that was allowing code to be executed you know it wasn't the intention of the people who put put the system together but it was just an accident that they left them this kind of backdoor open closure has a similar facility for allowing you to execute code at REE time and by fault read eval is truce with read eval controls whether or not you can use this special notation it's the hash equals and then some expression and we can execute code that's that's a useful facility for a lot of what closure needs to do but it's not safe if you're reading untrusted data you can't just hand that to read or evaluate some string you got from user with read string without being careful about filtering that so the issue there for a lot of people though is you know if you know about read eval you know what you have to do you can protect yourself but a lot of people who are coming new to closure may not have known about this the documentation you didn't really call this out you had to you had to work a bit or you know go to some of the conferences and hear about this before before you'd know about it so there are a couple of bugs filed a few complaints on the mailing list closure one five was about to ship in release candidate and the first you know back and forth was well do we really need to do with this now enough people complain I think that finally rich decided that you know he just say don't panic guys we're gonna do something about it so the first thing you need to know is that closure read/write which is the guide says closure he was signed by hyper-intelligent pan-dimensional beings and by that we mean you know all the great lists Packers going way back to the beginning you know common lisp certain has this kind of facility where you can evaluate code in the reader as we said before read is for trusted input some of the you know experts knew that you had to bind read about a false but even before 1.5 came out even with read eval false you could still execute java constructors and if the bad guy was clever he could think of some you know Java constructor that could cause problems for you but we needed to do something like this to handle records right because we're trying to construct essentially java instances on the fly so the issue here there's exact the guide says the common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools that's a little harsh but I thought it was funny so okay so the solution that rich came up with I think this you know turned out really good in the end it was a little rough getting there but closure Eden is a new new namespace new facilities it's a new way of reading that's safer okay so we get rid of the the sharp equal construct altogether we're not going to do call any Java constructors we're just going to allow the notation for what's known as Eden and we'll get to that in a minute but this is giving you a facility that's safe for reading data you know from say users or you know untrusted sources this will never execute code so closure Eden read is similar to read but a little bit different in the way you manage your data readers and the default data reader we're we're gonna take a map of options that define the readers well there's three things the end a file how you how you what you return to the end of file what your data readers are that we're gonna use for reading and what your default data reader is so instead of using that a dynamic bars and doing binding kind of in a global sense we're putting these options right into the call to read and defaults it's very similar to the closure core read it defaults to the in stream and it will use the default data readers and restrings the same idea but our input is coming out of this string now another solution is to use the contributory closure tools reader that's written completely in closure so it's successful you can take a look at that make change that if you want it they've done a good job of keeping up with whatever closures doing on that on the the main closure reader they are including now a needing an only reader so you'll get the same kind of API but in in the closure tools reader namespace and this works back all the way to closure 1.3 so if you have old code you should start using take a look at closure tools reader if you have old code and you're calling read you're probably better off going to the the New Eden style of reader it's safer so what is even it's similar idea to JSON it's an accessible it's known as the extensible data notation there's a web site that will take you to the details of it's kind of an emerging standard it handles all the closure data types that you'd expect including symbols and keywords and of course the important thing to make it extensible it handles tags tag literals or defines the syntax for tag literals and we're now getting other invitation in places for other languages so you can think about sharing data using an even notation it's kind of making the world safe for closure style printing closure date negotiation so I'm calling even like the Babel Fish of data formats and the bale fish was something from Hitchhiker's Guide that allowed people to understand other languages it took care of worrying about translations for you so the guide on XML when talking about data formats we always have to start with XML the guide says in the beginning XML was created this has been made a lot of people very angry it has been widely regarded as a bad move so that's I know the guide is opinionated about some things so XML is important and we all have to deal with XML it's a little bit complicated I call it the Encyclopedia Galactica data formats because you know it is common and it's you know very strong standard but you know going way back to the beginning I think of XML is s-expressions with better marketing and even the timbre and XML expert kind of concedes the point that s-expressions would have worked fine but at this point everyone likes XML so yeah XML kind of one there Jason was reaction I think to complications of XML and Douglas Crockford describes it as a frat fat free alternative to XML he's also said you know the good thing about reinventing the wheel is that you get around one so he really emphasized let's make this simpler we don't need all the complications that they you know the experts create an XML one issue with JSON is that it's not extensible and people get around that by using conventions and object encapsulation but I think maybe with even we can do a little bit better now JSON has kind of shown the way to do something other than XML so and it's a huge success even you know wants to follow in its footsteps but it's extensible it's a little more formal in that in the way it's extensible it has some more base types which are useful the you know the closure symbols and keywords a little bit more syntax but it's worth it for our extra types and there's maybe a different angle that you know closures about value so we're conveying values not objects and a last point is it's slightly cheaper and by that I mean if you're already in closure Eden is is natural it's what you're used to and JSON does a little more work if if you're in JavaScript and of course you know or some other language maybe maybe even still more work to deal with now all right so I have to come to my unorthodox ideas we're going to go quickly through these so if you're thinking about Roman numerals the usual thing might be we have a way to partial Roman numerals usually pass a string but think about what data readers get the first shot at the data even a symbol a bear symbol no quotes is available to your data reader it's a little unusual maybe to see a symbol there and have a special interpretation of it but your your reader will get first shot at that the compiler never see that that that symbol they're coming after your tag so even if you had that bound some value that's not what would happen so you can take apart symbols and do do something interesting there or reinterpret symbols using a tagged literal people sometimes complain about the prefix notation for math they want something better of course what they mean is RP n but so but you can you could easily write an RPN interpreter there in your data data reader and but you I was suggesting you want to expand that into the usual closure notation some other language might expand in a different way then that gets evaluated as you bet spy scope is libraries I saw from the David Greenberg he announced it on the list a while ago and it was kind of a clever use of the tags not so much for creating new literals but just kind of for the convenience of you can just use it for debugging and tracing he was dropping in his spy /d notation in front of any form and then get getting us some debugging information that's the baby I used to have my own debugging macro where but I had to call you know regular macros I had to wrap the parentheses just right or I'd use print lens but again you'd have to wrap it just right you have to change your code a lot the idea here is that you can drop this notate annotation right into the code anywhere in the middle and it still it doesn't disturb the rest of your rest of your code so it's easy to get in and out and it's I don't know I think it's more useful than print 'ln kind of debugging and the final kind of crazy idea here is my idea for doing a conditional feature reader and in this case I've been calling it con def or so it it looks kind of like a cond and that you have these pairs of some kind of test and then some kind of result but the test here is a special DSL with the usual kind of combinations you can add and or not then we're taking symbols but we're interpreting these symbols so I'm kind of ripping apart that symbol jdk 1.6 plus and that means okay I'm talking about Java the version 1.6 or greater so the plus at the end is interpreted as or greater and LJ you know 1.5 dot star the dot star means any version of 1.5 that it wouldn't cover like say 1.6 or 1.4 and with combinations of those kind of what I'm calling feature identifier czar feature versions you can do interesting things and this is all happening at Reed time so the compiler will never see any of the conditions that don't succeed and in this case you know reducers are only available for if you're using Java one six and using closure 1.5 so if you're using closure one five on the older version of Java reducers won't work and then else is another just literal I'm interpreting that especially is always true okay so in summary the tag literals are mostly harmless closure Eitan is your safe space and data readers opened all kinds of crazy ideas and finally if if you're interested douglas adams check out Talde org the 25th the maze is a day we all wear our towels to remember douglas adams okay that's it so i've used most of my time maybe have a second for any questions sorry I can't see anybody okay yeah they're very good yes you can pose tags because you just think of the the kind of rightmost tag gets interpreted as some kind of data you know some literal and then that's the value that gets passed back to the you know the the leftmost tags so yeah they'll work that way but it does get a little complicated especially if you're using all kinds of crazy concrete implementations that means other tags have to know something about those your your other data readers for those tags would have to know how to handle those other classes okay well thank you very much you
Video description
from infoq