WSSWebServicePackager tool - Deploy Custom Web Services in Sharepoint

June 17, 2008 on 3:58 pm | In .NET Coding | No Comments

I had already slogged through the publishing of my own web services in Sharepoint a few times, as directed by The Horses Mouth. It’s not pretty, and it’s not fun. Takes about 20 minutes to half an hour, could be faster if you really know how to do it, I guess I’m just put off by all the careful quoting and whatnot.

Anyway, no matter what they tell you XML is not meant to be a human readable format! So it just so happens there is a tool that makes my life a whole bunch simpler. Without further ado, I’ll just give you the link. Go check it out. (It has a gui, so don’t worry about where the command line parameters reference is.)

NOTE: There is an OLDER version of this tool by another author on CodeProject — this is NOT that tool, this one is meant for WSS 3 (.net 2.0), the CodeProject one is for WSS 2 (.net 1.1).
It says that there is a requirement to place the file in the same directory as Disco.exe, so, using Visual Studio 2008 you can either copy the disco.exe from the Microsoft SDKs\Windows\v6.0A\bin to another folder where you put the WSSWebServicePackager.exe. OR you can just dump the WSSWebServicePackager.exe into the Microsoft SDKs\Windows\v6.0A\bin folder and it’ll be in your path automatically when you do a Visual Studio Command Prompt.

Make sure you save the files into the Layouts directory under your 12 hive (I mean specify the layouts directory when you’re running the tool as the output directory.) After you’re done with that you’ll have to copy them to the ISAPI directory, and edit the xml file so it can be discovered as if it were a regular old sharepoint web service. (This tool makes it so you don’t have to do all the steps in the first link I posted, but YOU STILL HAVE TO READ IT. There are some things that you won’t get except from that document, and unless you’re able to assume knowledge directly, well.. just go here and make sure you covered everything.

Context-Aware SharePoint Designer Custom Workflow Activities

May 7, 2008 on 4:11 pm | In .NET Coding | 2 Comments

I had this nice little idea that I could teach the guys in my department how to do workflow with the Sharepoint designer so I wouldn’t have to do all that myself with the big Visual Studio Guns. All was going well until they needed to send out an email containing the URL of the site from which we were sending it. Not a link to an item in the site, the actual URL of the site itself.

“Simple!,” said I. I’ll just get it from the sharepoint object model with a custom activity. I wrote it all up, tried to use SPContext.Current.Web… and got it to puke nicely on a Null reference to Current. Of course! Why would a workflow activity have any reference at all to sharepoint? Technically a workflow activity doesn’t care where it’s hosted. But mine were to be hosted in Sharepoint, and that’s what I wanted to access. So, I looked around on the web and found nothing. I knew it had to be possible to get the context somehow, because other Sharepoint Designer Activities that came in-the-box were using it… so I looked for a clue.

I cracked open Microsoft.SharePoint.WorkflowActions.dll in Lutz Roeder’s Reflector, and lo-and behold there is a very cheap trick at work here!

Microsoft has implemented a custom dependency property by the name of __Context that is picked up as a Microsoft.SharePoint.WorkflowActions.WorkflowContext object IF it exists in your code. If you don’t implement such a property then it’s just not there, and nobody’s the wiser. This WorkflowContext object has a reference to .Site and .Web as properties, so it’s really handy to have at from within your custom activities that need to know where they are.

I’d like to post the code I found directly, but I can’t since it’s not mine. I will show you, however, what my code looks like.


 
public static System.Workflow.ComponentModel.DependencyProperty __ContextProperty =
            System.Workflow.ComponentModel.DependencyProperty.Register("__Context", typeof(WorkflowContext), typeof(FooActivity));
 
        [Description("Context")]
        [ValidationOption(ValidationOption.Required)]
        [Browsable(true)]
        [DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)]
        public Microsoft.SharePoint.WorkflowActions.WorkflowContext __Context
        {
            get
            {
                return ((Microsoft.SharePoint.WorkflowActions.WorkflowContext)base.GetValue(FooActivity.__ContextProperty));
            }
            set
            {
                base.SetValue(FooActivity.__ContextProperty, value);
            }
        }
 

That’s a lot of typing if you write a bunch of these activities, so I made a codesnippet just for all of you out there to have and keep forever so you don’t have to do this. (Warning, do not just make this into a base class and derive your activities from it. It will throw an exception when you try to use it. I’m not sure why, but it does. You have to add this to every single activity you create for SPD that needs to know the sharepoint context.)

Here’s the code snippet I made:


<?xml version="1.0" encoding="utf-8" ?>
<CodeSnippets  xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">
  <CodeSnippet Format="1.0.0">
    <Header>
      <Title>spwfactivity</Title>
      <Shortcut>spwfactivity</Shortcut>
      <Description>Code snippet for creating a Sharepoint-Aware Workflow Activity</Description>
      <Author>Dolan</Author>
      <SnippetTypes>
        <SnippetType>Expansion</SnippetType>
      </SnippetTypes>
    </Header>
    <Snippet>
      <Declarations>
        <Literal>
          <ID>classname</ID>
          <ToolTip>Class Name</ToolTip>
          <Default>Activity1</Default>
        </Literal>
      </Declarations>
      <Code Language="csharp"><![CDATA[public partial class $classname$: System.Workflow.ComponentModel.Activity
  {
        public static System.Workflow.ComponentModel.DependencyProperty __ContextProperty =
            System.Workflow.ComponentModel.DependencyProperty.Register("__Context", typeof(WorkflowContext), typeof($classname$));
 
        [Description("Context")]
        [ValidationOption(ValidationOption.Required)]
        [Browsable(true)]
        [DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)]
        public Microsoft.SharePoint.WorkflowActions.WorkflowContext __Context
        {
            get
            {
                return ((Microsoft.SharePoint.WorkflowActions.WorkflowContext)base.GetValue($classname$.__ContextProperty));
            }
            set
            {
                base.SetValue($classname$.__ContextProperty, value);
            }
        }
 
    public $classname$()
    {
      InitializeComponent();
    }
  }]]>
      




You’ll also have to add a Parameter value to the .actions file with your activity that specifies the __Context property as an Input property.


  .
  .
  .
  <Parameters>
    <Parameter Name="__Context" Type="Microsoft.SharePoint.WorkflowActions.WorkflowContext, Microsoft.SharePoint.WorkflowActions" Direction="In"/>
   .
   .
   .
  </Parameters>

This one took me a while to figure out, but finally it worked!

If you want some more info on adding your own actions to designer (my article has thus far assumed you’re familiar with the process,) check out this link.

Update: I forgot to mention that just about the DAY after I figured this out on my own, I stumbled onto this codeproject article that dealt with this exact topic.

Just for fun I’ve also attached the snippet file so you don’t have to copy and paste it. Just save it into “%ProgramFiles%\Microsoft Visual Studio 9.0\VC#\Snippets\1033\Visual C#.”

Click here to download it.

Visual Studio 2008 Workflow designer BUG!! (feature?)

April 7, 2008 on 11:53 am | In .NET Coding | No Comments

I just got back from tearing my hair out over a frustrating bug I found in the workflow designer included with Visual Studio 2008. First let me start by saying that developing using sharepoint as a platform for workflow, especially with integration into Visual studio has been an overall pleasant experience. I have been able to work at this much faster and easier in 2008 than I ever had using the workflow extensions for VS 2005. Of course, sharepoint is my focus for a workflow host, so I’m happy with the options I have out of the box. That won’t be the case for many who use workflow for other, non-sharepointy things. You’ll have to wait for the rumored and afamed 2008 version of the extensions to be released in June of this year.

Now, about this bug. As you may or may not be aware, workflow serializes it’s state to a persistence provider when it’s ‘put to sleep.’ This state is basically saved a as state bag (hashtable) of data and pumped as a binary lump to a persistence provider. Each property of each item is known by a key consiting of, among other things, its property name. Now, since multiple activities, or even state variables, can possibly contain properties of the same name, it makes sense to add something else to the mix to identify one as being unique from the other. You might think they’d use some kind of a guid or something, and internally that’s involved somehow, however, there is a different, seemingly more accessible way that workflow foundation solves this problem. Activities that require continuations based on the serialization also require that you specify what’s known as a Correlation Token. This is a pairing of an arbitrary string with an owner activity. Event driven activities, for one example, that wish to consume sequential or related events will need to use the same Corellation Token if they are to act upon the same item. The example that I’m using (and so happens that I was cursing) is the CreateTask to OnTaskCreated pairing. If you want to handle the OnTaskCreated event for a particular CreateTask activity you have to make sure that you’re using the same Correlation token.

Easy enough right? The designer lets you, via the property panel, enter the name of the token, and select the owning activity. That’s fine, of course, unless you start getting to antzy about the order or layout of your activities and you cut one here and paste it there. After you’ve copy/pasted one to re-arrange it, you’ll notice upon inspection of the property grid that all seems to be intact. The correlation token is the same as it was, specifying the same name of the token, as well as the identical owner activity. When I run the workflow after doing this, I discoved something quite strange.

The OnTaskCreated handler wasn’t firing until the next one was created.

After much pain and suffering, and a support call to microsoft, which I paid for, but as of yet have got no response to it, I discovered something in the designer code that explained my problem. (This code appears in the [workflowname].designer.cs file, in the InitializeComponent method’s implementation block.)

The tokens look the same, as in they have the same attributes, but they are two separate instances of the CorrelationToken class. This is probably something that the designer overlooks, because it’s simply creating objects with the values to fulfill the properties I’ve set, instead of backtracking to see if some token or other is already there from before. So I corrected the assignment statements to use the same tokens for both instances, and it’s all better now.

see here:


       correlationtoken1.Name = "someStateTaskToken";
       correlationtoken1.OwnerActivityName = "someStateActivity";
       this.SendDirRejectionNotice.CorrelationToken = correlationtoken1;
...
 
        correlationtoken2.Name = "someStateTaskToken";
        correlationtoken2.OwnerActivityName = "someStateActivity";
        this.SendDirApprovalRequired.CorrelationToken = correlationtoken2;
 

should become this:


       correlationtoken1.Name = "someStateTaskToken";
       correlationtoken1.OwnerActivityName = "someStateActivity";
       this.SendDirRejectionNotice.CorrelationToken = correlationtoken1;
...
 
       this.SendDirApprovalRequired.CorrelationToken = correlationtoken1;
 

I just looked through the rest of the deisgner generated code for the same sort of constructs to make sure that I had no dups. Phew, what a train wreck! I had to correct it in about 7 places.

So, all things considered the experience I’ve had with the workflow and designer has been ok, but this is a glaring hole. You want to hear my theory as to why this happens, and why the event actually fires after the next task is created?

Sure, I’ll tell you. Since the idea of the tokens is to provide a handle on state before and after serialization, it’s probably referenced by it’s values after it’s deserialized, (i.e. when the code tries to hydrate the instance of the workflow, it pairs the tokens with their owners by the names of the instances.) This means that if you always persisted it, and re-hydrated it before the event fires, the duplicate token declaration ‘in code’ probably wouldn’t matter, because it would be ‘reconnecting them’ by their names. But the problem I experienced is becasue the first and second instances of the tokens are both instantiated right after my workflow starts, and they aren’t serialized yet, and this reconnection doesn’t happen yet… and somewhere inside the workflow’s in-the-box code these tokens are compared by reference or some other non-value hash for equality, instead of by value of their members, and of course, they don’t match that way, they’re separate instances!

I don’t know if fixing it like this is A) stable (in that it will survive any change I make in the designer), B) a good idea, which it probably isn’t. It’s a hack to fix a broken workflow for this one very specific issue, and I’m hoping to heck there is a hotfix for it. I wouldn’t recommend it beyond a build cycle or two. I don’t know when this code is regenerated by the designer, nor what causes it to create separate token instances (I do know that copy/paste is one!). I’ll post back if I find out.

I would classify that as a bug. Good luck folks! Love that workflow!

Tired of the Crappy MS Oracle OLDEB Provider?

February 27, 2008 on 11:43 am | In .NET Coding | No Comments

If you use Linked servers in SQL Server 2005 to connect to oracle servers, and have been frustrated by the slowness of the Microsoft OLEDB provider for Oracle, then we have something in common.

Lucky for you and me, there is a fix.

Step 1: Install the Oracle Client freely downloadable from their website on your SQL server host. (Requires a little spam info.)

Step 2: Configure your client by setting up your TNSNames.ora file as per usual.

Step 3: in Management studio, login as a server admin (sa, or someone with the server admin role)

Step 5: Issue the following commands, (it’s in secret code, so be careful!)


exec sp_addlinkedserver 'linkname', 'Oracle', 'OraOLEDB.Oracle', 'oraclehostname'
exec sp_addlinkedsrvlogin 'linkname', false, null, 'usernamehere','secretpassword'
 

That’s it kids! Just say no to the wizard.

You can then create a view in one of your existing databases that selects from this Link, and almost as magically every tool that hits SQL server can now be using SQL Server as a conduit for Oracle. (It’s better to do this if you’re combining data or something, not just as a very expensive DB provider of course, but it still works however you like it, and the chances are you’ve got another SQL host somewhere that wouldn’t mind pulling some data for your developers convenience.) My FAVORITE feature about doing this is that I can now do LINQ to Oracle. So there! Go Forth, Be Multiply, and Fruitful.

Colorizer Part II.V

December 19, 2007 on 12:55 am | In .NET Coding | No Comments

A friend of mine pointed out to me one possible way of finding syntax errors, and resuming parsing after one successfully. The key is to split the input text on some known divider. It’s possible that your grammar won’t have one, but in the case of, say… C#, you can split all the statements on ‘}’ and ‘;’ . Then you can store each as an independently parsed string (even store it’s parse tree in a linked list or array or something) and then flag each with an error or not. This is a great trick, provided that you have some identifiable block/line terminator to split on. For line based grammars this is exceptionally easy.. just split on newlines… for not line based grammars, you might split on block terminators only, like ‘end’ in ‘Simple.’ At least then you have some smaller unit of ‘parse’ to work with and you can track state of parsing across a document with some degree of efficiency. That also means you can trigger the ‘re-parsing’ of small segments when new text is added or pasted. Very interesting. I’m right now trying to think of a case that there wouldn’t be a way to do this… perhaps some functional languages would be eternally nesting… but as far as I’m concerned that’s a corner case ;)

Building a Syntax Highlighter: Part II

November 26, 2007 on 1:24 am | In .NET Coding | 2 Comments

Ok so you’ve seen that which is my quest, now for a little more slightly boring, yet more useful, background information on how I’m tackling this problem.

First of all, I’ll point out that currently the state of the ’syntax aware editor’ industry is pretty sad. (This is not the case in the world of C# and Visual Basic, and perhaps some of the languages that work in full force in visual studio and or mono/sharp develop. The examples I cited, of course, are out there now, using dynamic compilation in combination with reflection to achieve their magical goals. I’m talking about doing this purely from the position of using a grammar as input and a 100% compiler-free parser generated entirely from the grammar. I’ll cede the point that dynamic compilation is really cool, and can do a lot more than what I’m trying to do in the way of telling you what to do next in your coding, but this project is meant for the casual domain specific language writer.) Most of the tools out there right now are ‘lexer’ driven only, and they almost always rely on some kind of regular expression framework, or worse, find/replace methodology. That might be ok for some smaller projects, but it doesn’t check syntax correctness and it doesn’t scale very well. Two examples right now are Compona SyntaxBox (now a part of the puzzle framework and recently integrated into the Fireball framework), and Scintilla. These tools are great for what they do, but they don’t validate syntax… (SyntaxBox has some rudimentary support for ‘intelligent sensing auto-completion,’ which I so term for not wanting to step on anyone in Redmond’s trademarks.)

Why a Parser?
So, if the state of the art in the non-commercial software arena of this field is to just use text (lexical) matching alone, why do we need a parser? I have already alluded to the answer… Syntax Checking. A parser can tell you if what you’ve got is valid according to the language rules… A tokenizer, (also often termed a lexical analyzer, a scanner, or a ‘lexer’, for short,) the current type of commonly employed tool, can just tell you if what you’ve got is a keyword, constant, whitespace, comment or whatever other kind of language token you might have. Knowing what kind of token you have is ok, if all you’re doing is making pretty colors, but if you type some ridiculous string of keywords, having them all bolded and colored doesn’t amount to anything useful if it’s improperly sequenced. For example, I’ll use javascript:


    if (x > 12) x -= 5; // valid
 
    if ++x {} else {} for; //still produces colors but makes no sense..

Both of these two lines will be happily highlighted by a lexer driven system, but only a parser will be able to tell you that the second line is meaningless gook.

How I do Parsing
In an acronym, LALR(1). What that stands for is Look Ahead Left to Right, and the 1 means I have one symbol of lookahead. The LR part just means I read the input text from Left to Right, and the Lookahead means that if two or more types of symbols start out the same, I can look at the next symbol in order to differentiate them. I will leave the exact details of what Lookahead means to the reader (we have wikipedia and google, so I’m not going to pretend to be a lesson in parsing 101.) One thing I will cover in a little more detail is the ‘Shift-Reduce-Accept’ cycle works in LALR. When a parser gets some input text, it’s fed to a Lexical Analyzer (lexer) and compared against a table of known text patterns (called the parsing tables) character by character. Each new character of input is compared against the starting points in the lexer section of the parsing tables to see which are the matches. Then, the next character in the input sequence is checked against the list of all possible valid ‘next characters’ based on the set of matches generated by the first character. As more characters are added to the stream of the lexer’s input, it is able to eliminate more and more of the possible alternatives, until ultimately, it will have a single exact match. A lexical match generates what is known as a ‘Token’ which is fed back to the parser, and stored on a stack or a list in the parser’s working memory. After returning (Fetching) a token, the lexer is technically done with it’s duty, and the parser resumes control. Upon fetching, the latest few tokens are compared against a ‘rule table’ which is a special table in the parsing tables that specifies sequences of whole tokens, not just characters. A match in this rule table will trigger all of the component tokens to be pulled off the list or stack and they will be replaced by a reference to the rule which they match. This ‘rule matching’ is known as a ‘reduction’ action. If no rule is yet matched after a token, the parser will continue invoking the lexer to grab new tokens and stash the results… this grabbing and queuing of tokens is known as ’shifting.’ So, upon acquiring a new token, the parser will try to perform a reduction, if it cannot make a valid reduction from the list of recently read tokens, it will continue invoking the lexer for tokens, and shifting the results until it can. When there are no more tokens in the input stream the parser has a decision to make. If all of the tokens can be reduced, then it will return an Accept message (meaning, yup, ok we read the input and it’s all within the language, following the language definitions to the letter,) or it will return a parsing error message, usually supplying some indication of what the last tokens were, and what tokens it was expecting but didn’t find. There is one more layer of complexity at work here… The parser will do this shifting and reducing as I’ve described, but it has to have a defined starting point, which also happens to be it’s ending point. There has to be some rule which contains some combination of the other valid rules that can serve as a unit of validity for the entire input text. This is known as the Start Rule (or Start Symbol.) The parser cannot just willy nilly match rules without first matching the start rule. If you’re parsing a program, for example, you’d have to define somehow what constitutes a program… You might say a program is a collection of statements, and a statement any one of a few predefined statements, a block, or a lonely expression. A comment would not be considered a statement by that definition, and if you say you’re looking for a collection of statements (one or more statements), and your entire source text is just a bunch of comments, then the parser will not produce an Accept message on the source, because the start rule wasn’t matched.

Gold Specific:
Gold is a fantastic tool, known rather inaptly* as a compiler-compiler, that will physically produce these aforementioned parsing tables. But it still needs some specification of your tokens and rules (and start symbol) in order to do so. This specification is called a grammar, which serves as a way to specify the set of all input that is a member your defined language. (In language parlance we say that a grammar ‘generates’ a language, meaning that it specifies the entire set of valid input considered to be in the language, even though nothing is actually physically ‘generated’.) If you grab a copy of Gold Parser Builder, you can crack open the help files and see a few example grammars. On the website, you can also download a number of them, one of which is called ‘Simple.’ Simple is a grammar which specifies a BASIC-like language for use in examples and testing of the builder tool itself. Since I’ll be referring to it from time to time as my example language, you can find it below. This grammar was written by the author of Gold Parser Builder, and the master and commander of the entire Gold world, Devin Cook.


"Name"    = 'Simple'
"Author"  = 'Devin Cook'
"Version" = '2.0'
"About"   = 'This is a very simple grammar designed for use in examples'
 
"Case Sensitive" = False
"Start Symbol"   = <Statements>
 
{String Ch 1} = {Printable} - ['']
{String Ch 2} = {Printable} - ["]
 
Id            = {Letter}{AlphaNumeric}*
 
! String allows either single or double quotes
 
StringLiteral = ''   {String Ch 1}* ''
              | '"'  {String Ch 2}* '"'
 
NumberLiteral = {Digit}+('.'{Digit}+)?
 
<Statements> ::= <Statement> <Statements>
               | <Statement>
 
<Statement>  ::= display <Expression>
               | display <Expression> read ID
               | assign ID '=' <Expression>
               | while <Expression> do <Statements> end
               | if <Expression> then <Statements> end
               | if <Expression> then <Statements> else <Statements> end
              
<Expression>   ::= <Expression> '>'  <Add Exp>
                 | <Expression> '<'  <Add Exp>
                 | <Expression> '<=' <Add Exp>
                 | <Expression> '>=' <Add Exp>
                 | <Expression> '==' <Add Exp>
                 | <Expression> '<>' <Add Exp>
                 | <Add Exp>
 
<Add Exp>      ::= <Add Exp> '+' <Mult Exp>
                 | <Add Exp> '-' <Mult Exp>
                 | <Add Exp> '&' <Mult Exp>
                 | <Mult Exp>
 
<Mult Exp>    ::= <Mult Exp> '*' <Negate Exp>
                | <Mult Exp> '/' <Negate Exp>
                | <Negate Exp>
 
<Negate Exp>  ::= '-' <Value>
                | <Value>
 
<Value>       ::= ID          
                | StringLiteral
                | NumberLiteral
                | '(' <Expression> ')'
 

I’ll come back to this later in more detail, but for now, notice the Start Symbol is defined at the top as <Statements>, and <Statements> is defined as a list of one or more <Statement>s, and of course that comments are not valid statements. So hopefully this helps to illustrate what I was saying about how LALR works.

To be continued… again.

*FOOTNOTE: I say that it’s inaptly named a ‘compiler-compiler’ because a lexer and parser are only the first two components of what is known as the ‘front end’ of a compiler. You cannot produce an entire compiler from the front end, and you cannot even produce the entire front end in most cases using a so-called ‘compiler-compiler.’ There are other very famous tools also known as compiler-compilers, not the least of which being Lex/Yacc (unix) or Flex/Bison (Gnu versions of Lex and Yacc.)

Annoying Me Softly, the RichTextBox

November 24, 2007 on 4:05 pm | In .NET Coding | No Comments

This is a side note to my parser project. I’m using a rich text box to display colorized syntax, and it’s really annoying me now that when I set the .Rtf property with a trailing newline (\par) the .Text property pretends I didn’t put it there. In fact, the .Text property seems to like trimming my text when I set it via the .Rtf proprety, but if I set it directly via the .Text, I don’t get said trimming. Can someone please tell me why this has to be this way? I’ve got the whole syntax editor thing worked out and it highlights according to a mapping I’ve defined for the tokens, and the parser DFA generated by Gold, but when I end the input with a newline character, it truncates it and shovels me back up to the previous line. How is it possible that I’m the first person to run into this? Here’s what happens:


MyRTFBox.Rtf = "someRTFgoodies...LastWord\r\n}\r\n\par";
 
string foo = MyRTFBox.Text;

when I put a debug watch on foo… it comes back fully formed having removed the trailing whitespace. “someRTFgoodies…LastWord”;

What?!? How can I work with that?

I’d like to thank Microsoft one more time for assuming they know what I’m trying to do and just doing it for me. You guys really know how to make my life easier.

EDIT: I was looking at the source to the RichTextBox in the Reflector, and I notice that this behavior is caused by the way they’re copying the text as a byte array (to account for various possible methods of character encoding.) and in the Encoding process one of the ‘helpful’ side effects happens to be the trimming of trailing whitespace. The only way I can get around this is to write my own RTF box? Does anyone out there know another way here?

Edit Edit Edit: It has finally occurred to me what to do about this. Since this problem goes away if there is text after the enter key, I just don’t trigger the parsing when enter is pressed. Duh. Thanks for being so patient with me… but STILL it’s silly of the richtextbox to prune my text without asking.

Building a Syntax Highlighter using Gold Parser Builder: Part I

November 23, 2007 on 12:24 am | In .NET Coding | 1 Comment

Here we are. Finally something useful again on my blog, instead of some drivel about the crap I’ve been working on with no code samples. This won’t be a HOWTO from start to end but rather a narration of my efforts on the subject. I intend to have it working at the end, so by the time I post the final installation, it will indeed be a HOWTO.

I’m working on a syntax-checking text highlighter (colorizer), and I’ve discovered that it’s much more difficult than I’d thought it would be. I have written parsers as front ends for a compiler before. I’ve written recognizers that tell you if you’ve made a syntax error… I assumed it might be a trivial exercise then to make myself a syntax highlighting component that can plug into an editor, but I forgot some terribly obvious points… I’ll start by listing these:

White Space Matters
That’s right. Whitespace matters even if the language I’m parsing doesn’t care about white space because coders will write code with all kinds of white space to facilitate readability. This is my number one issue. It’s the one I haven’t entirely decided how to surmount.

Even bad Text has to Stay in the editor
If there happens to be some code in the editor that doesn’t parse, I cannot ignore it, and I have to save it in the text stream because otherwise it would be destructive to the whole idea of ‘unfinished code.’

You must parse all of the input text at some logical moment
If your users are editing some code, you won’t want it to miss something, so it’s got to be parsing pretty often. How often is too often? Well, I don’t know. I’ll say for now that I’m going to do it at every keystroke, and I’ll parse all of the input at once. This second bit about parsing all of it at once will eventually have to change for two reasons. I have to continue parsing after an error, because it’s possible that the rest of the document is valid, and should be highlighted, leaving just the lonely error bit to be complained about. The other reason is that documents can be huge. Immense. Large. It doesn’t make sense to parse the entire input file on every key stroke unless the input is below some reasonable threshold. (Like for example, the stuff that’s visible on the screen.)

Mainly these are the only three big issues, but they are indeed big. There are products out there that help and that are meant to do just syntax highlighting, or those that do so via regular expressions or find/replace etc… but these are computationally intense, and they usually don’t offer the ability to point out syntax errors… For example, Scintilla.. it will highlight text all purty-like, but it doesn’t tell me that I need a semi colon at the end of the line when one is missing. The same goes for the tools like SyntaxBox etc. They’re great tools, but they have limited functionality. Nope, I can’t cop out and cheat on this one. I have to implement something real, and that doesn’t require visual studio integration. (Which is why I can’t use a Babel language service.)

Gold Parser Builder is my parser generation tool, and since I work in .NET (or rather since I am on this project) I’ll be using .NET engines for the task. The two engines with which I am familiar are the Calitha engine, created by Robert van Loenhout, and the Morozov engine, created by Vladimir Morozov. The parsing technique exposed by Gold is LALR(1).

The first task is to generate a parser that’s capable of passing over the source text and actually parsing it. With gold parser that’s incredibly simple.. there is a grammar designer in which you specify the language to be parsed in a series of EBNF rules. After the builder checks the grammar, it will generate a deterministic finite automaton (DFA) which can recognize the specified language of the grammar. There is even a great testing facility built into the builder which allows me to test my grammar for correctness. I can throw sample input text at the generated DFA and it will tell me if the text parses or not, and throw a detailed trace and error log if for some reason that it will not.

I’ve been active on the Gold Parser Builder mailing list and the google group for the project, and I’ve raised the issue of building a tool such as a syntax checking text colorizer and asked for help from the community. I received a response from a certain Kelly Parker, with whom I’ve begun working on the task. I’m also working independently on a commercial version of this project for use with a certain not-yet-disclosed entity, but concurrently, I’m producing a more generic open source version that decouples the language itself from the components of the parser from the highlighting/checking utility for use in an editor. Kelly and I are working on this open source version. (The two implementations are completely different, because they have different goals. Whenever I refer to ‘the project’ from now on, I’ll be talking about the open source version.)

To be continued…

Early Christmas this year!

November 19, 2007 on 10:02 am | In .NET Coding | No Comments

Visual Studio 2008 drops this week as an RTM. I’m going to be checking the website about every 3 hours now. Whee!

C# Template for GoldParser Builder and the Morozov Engine

October 15, 2007 on 12:58 am | In .NET Coding | 1 Comment

I decided to give a shot to creating a template for Gold Parser Builder that implements a C# instance of a parser making use of the Morozov Engine, which is known in the Gold Parser world as the fastest of the C# engines.

Thanks to Devin Cook for making this all possible by creating Gold Parser Builder, and also to Vladimir Morozov for his contribution of the great Gold engine for us C# geeks. (a few of the structure-like features of this template were.. uh.. borrowed from the Calitha template, but nothing really ‘codey’, just ’structurey’)

You can get the template here, in case you’re interested: http://davedolan.com/downloads/C-SharpMorozovNet2.pgt

Updated I forgot to add the definition of SymbolException and RuleException to the original download (it was correct in the version that went to the list, I just got a little happy cleaning up the script and … chopped two things too many. It is FIXED now.

I’ve also added a .Net 1.x compatible version of this template: http://davedolan.com/downloads/C-SharpMorozovNet1.pgt

Don’t even ask about .Net 3, and if you were thinking ‘what? why not?’ then perhaps you should not be making parsers, even with templates ;)

Next Page »

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^