Far too late in the morning musings...
Apr. 3rd, 2003 01:46 amSo, first, I should be asleep now. I can't afford to stay up late like this.
I went to bed at 10:30. It's now quarter to one. Many many things wandering through my head to keep me awake...
First -- I should be able to write a program to write the import programs for a scannable form. Obviously it won't handle all the intricacies of each form, but I should be able to at least get _most_ of the work out of the way (ie, those damned calls to $tdb->set_nn_value()). This program should connect take a form name as input, connect to TrialDB, get the structure of the form, and generate a template import program. It would take care of making sure to call $tdb->process_date() on all date values. It should probably connect to the source database as well, so it can actually get the definitive list of columns, so that it can write the stupid "my ($var1, $var2, ...) = $db->selectrow_array('select var1, var2, ... from table_name where id=$id')" statement -- especially since for the cs_epidemiology form, that's likely to be about three hundred variables. Actually, I should look at $sth->fetchrow_hashref(), and see if that's useful. It can then query TrialDB to figure out which are dates, and try simplistic matching for repeating groups. Obviously I'll have to go back and fix up some of the output'd program's handling of various things (like those "check all that apply" ones...). It would be nice to get it to look at the skip logic in TrialDB and do the implementation of it for the import, so that I don't have to. To do that I'll have to figure out how TrialDB stores the skip logic, but I should probably do that anyways...
The next issue is this whole "determine eligibilty / get the core automatically / politics" mess. Realistically, I have the determine eligibility. If we _can_ get the core, Nora should rewrite her program to look into the core extract data as well. This means that I need to finish up the new method for importing core data -- and I can make use of TrialDB::ImportScannable, although I should probably evaluate if I need that, or should create a "TrialDB::ImportCore" as well -- this one containing routines to extract core variables. I should probably take some of my import logic and move it into a "TrialDB::Import" module, but I'm not sure that I'm ready to do that. That module should probably support call-backs to handle mapping data, and various other things, which I don't think I'm ready to work with yet. So the issue now, is how to get the Core. Unfortunately, I can't actually dictate how UCI will do things. But I imagine that right now, they are an MS shop. The kind of output that we get from them, not to mention the kind of technical responses I've gotten, seems to indicate that nobody there has a clue about unix-y stuff, which is a pity, because automating things on Win32 is nowhere near as easy as on *nix. After all -- Win32 was originally a desktop OS, *nix is server oriented originally. OTOH, right now I don't have *nix box which can talk to our Oracle server _anyways_, so I'm going to be working on a Win32 machine myself. Probably the best way to handle this is with emails. There _must_ be POP3 modules for Perl, which would allow me to connect to a mail server, and retrieve any new messages. Thus, I can add a receiving address to hedwig. Except then hedwig needs to run a POP3 server. Hmmm.... must think more on this. I also need to look at perl with public key cryptography -- it looks like Crypt::RSA will provide the necessary modules, and it looks like there's a Win32 distribution of Math::Pari, which is the only module it claims to require. I should install it to make sure... So once I have the cryptography, all I need is the info on how they call into their database to get the data out. I may be able to script this via OLE if it requires some kind of Access db -- but hopefully they have a command-line-ish program anyways...
Of course, that brings me back to the extract program. I need to rework the part that interprets the spec. I want to be able to do something like "Ovarian_Cancer(!ovarian_baseline -meta_dictionary roca_ca125_roca_values)+pids=...+bad_date_to=NULL" -- that is, support clusters (which I may have to rework how I get the data out, because what happens (as it currently does) if different clusters share the same question group. Must think on this.), and subtracting options. I think that I only need to deal with the "special" options -- that is, meta_data, meta_dictionary, meta_messages. You don't need to be able to subtract regular clinical data question groups. The other upgrade is that I should create a new table (like, say, create table extract_types (study_id integer references studies (study_id) on cascade delete, extract_name varchar(255), extract_description varchar(1000), spec varchar(1000))) -- this would allow me to then have a list of predefined extract types (with, of course, cute names, like 'the whole hog', 'soylent green', 'mo..' (who memorized the dictionary, although I'm not sure if this means that they get only the dictionary or that they don't get the dictionary), etc...). But I think this will require major rework of the spec part of the process...
And of course, an installer for the extract system -- it should be able to use the Win32::ODBC stuff to create the ODBC connections needed, use PPM to install any required modules, and generally prepare the system for the program to run. Possibly, if I have the program create the ODBC connections, I don't have to put the files there in advance? I create them anyways, so if I can just have the connection be created, I don't even need to call 'new_extract.pl generate'.
I want to look at changing the interface to TrialDB, but for now I'm content to hold off on that until the next upgrade. Then, I want to redo at _least_ the 'study select' screen, 'cause it's awful. And not in the good way. There are a number of other screens that would be good to redesign. But really, what I want to do, is get started on the access client in .asp -- so that we can lose the whole access client, and get everything doable from any computer. Ideally, I should be able to write it without having to have it be a MS browser, but I should talk with Jeff about how feasible this is -- I would like to be able to have it work on Mozilla though, 'cause then it could be done from _any_ platform.
Which brings up a question -- why isn't mod_asp compatible -- is there some way that I could have this at least being hosted off of *nix? I should look into this, so that I can try to see if there's some way to fix this. And of course, that makes me want to look at why it requires an MS browser... there should be some way to do this stuff without it being a platform specific issue... -- well, with mod_asp at least, it just doesn't exist. There is a mod_asp.net, but it only runs on Win2K boxes anyways, and otherwise I have to go with a commercial product, and I'm not willing to make that case yet (although at $495 for Sun's product, it's pretty cheap... about the same as the various Perl technologies I want...)
OK -- so, I've been dumping here for a while (it's now quarter to two) -- hopefully I can go sleep?
I went to bed at 10:30. It's now quarter to one. Many many things wandering through my head to keep me awake...
First -- I should be able to write a program to write the import programs for a scannable form. Obviously it won't handle all the intricacies of each form, but I should be able to at least get _most_ of the work out of the way (ie, those damned calls to $tdb->set_nn_value()). This program should connect take a form name as input, connect to TrialDB, get the structure of the form, and generate a template import program. It would take care of making sure to call $tdb->process_date() on all date values. It should probably connect to the source database as well, so it can actually get the definitive list of columns, so that it can write the stupid "my ($var1, $var2, ...) = $db->selectrow_array('select var1, var2, ... from table_name where id=$id')" statement -- especially since for the cs_epidemiology form, that's likely to be about three hundred variables. Actually, I should look at $sth->fetchrow_hashref(), and see if that's useful. It can then query TrialDB to figure out which are dates, and try simplistic matching for repeating groups. Obviously I'll have to go back and fix up some of the output'd program's handling of various things (like those "check all that apply" ones...). It would be nice to get it to look at the skip logic in TrialDB and do the implementation of it for the import, so that I don't have to. To do that I'll have to figure out how TrialDB stores the skip logic, but I should probably do that anyways...
The next issue is this whole "determine eligibilty / get the core automatically / politics" mess. Realistically, I have the determine eligibility. If we _can_ get the core, Nora should rewrite her program to look into the core extract data as well. This means that I need to finish up the new method for importing core data -- and I can make use of TrialDB::ImportScannable, although I should probably evaluate if I need that, or should create a "TrialDB::ImportCore" as well -- this one containing routines to extract core variables. I should probably take some of my import logic and move it into a "TrialDB::Import" module, but I'm not sure that I'm ready to do that. That module should probably support call-backs to handle mapping data, and various other things, which I don't think I'm ready to work with yet. So the issue now, is how to get the Core. Unfortunately, I can't actually dictate how UCI will do things. But I imagine that right now, they are an MS shop. The kind of output that we get from them, not to mention the kind of technical responses I've gotten, seems to indicate that nobody there has a clue about unix-y stuff, which is a pity, because automating things on Win32 is nowhere near as easy as on *nix. After all -- Win32 was originally a desktop OS, *nix is server oriented originally. OTOH, right now I don't have *nix box which can talk to our Oracle server _anyways_, so I'm going to be working on a Win32 machine myself. Probably the best way to handle this is with emails. There _must_ be POP3 modules for Perl, which would allow me to connect to a mail server, and retrieve any new messages. Thus, I can add a receiving address to hedwig. Except then hedwig needs to run a POP3 server. Hmmm.... must think more on this. I also need to look at perl with public key cryptography -- it looks like Crypt::RSA will provide the necessary modules, and it looks like there's a Win32 distribution of Math::Pari, which is the only module it claims to require. I should install it to make sure... So once I have the cryptography, all I need is the info on how they call into their database to get the data out. I may be able to script this via OLE if it requires some kind of Access db -- but hopefully they have a command-line-ish program anyways...
Of course, that brings me back to the extract program. I need to rework the part that interprets the spec. I want to be able to do something like "Ovarian_Cancer(!ovarian_baseline -meta_dictionary roca_ca125_roca_values)+pids=...+bad_date_to=NULL" -- that is, support clusters (which I may have to rework how I get the data out, because what happens (as it currently does) if different clusters share the same question group. Must think on this.), and subtracting options. I think that I only need to deal with the "special" options -- that is, meta_data, meta_dictionary, meta_messages. You don't need to be able to subtract regular clinical data question groups. The other upgrade is that I should create a new table (like, say, create table extract_types (study_id integer references studies (study_id) on cascade delete, extract_name varchar(255), extract_description varchar(1000), spec varchar(1000))) -- this would allow me to then have a list of predefined extract types (with, of course, cute names, like 'the whole hog', 'soylent green', 'mo..' (who memorized the dictionary, although I'm not sure if this means that they get only the dictionary or that they don't get the dictionary), etc...). But I think this will require major rework of the spec part of the process...
And of course, an installer for the extract system -- it should be able to use the Win32::ODBC stuff to create the ODBC connections needed, use PPM to install any required modules, and generally prepare the system for the program to run. Possibly, if I have the program create the ODBC connections, I don't have to put the files there in advance? I create them anyways, so if I can just have the connection be created, I don't even need to call 'new_extract.pl generate'.
I want to look at changing the interface to TrialDB, but for now I'm content to hold off on that until the next upgrade. Then, I want to redo at _least_ the 'study select' screen, 'cause it's awful. And not in the good way. There are a number of other screens that would be good to redesign. But really, what I want to do, is get started on the access client in .asp -- so that we can lose the whole access client, and get everything doable from any computer. Ideally, I should be able to write it without having to have it be a MS browser, but I should talk with Jeff about how feasible this is -- I would like to be able to have it work on Mozilla though, 'cause then it could be done from _any_ platform.
Which brings up a question -- why isn't mod_asp compatible -- is there some way that I could have this at least being hosted off of *nix? I should look into this, so that I can try to see if there's some way to fix this. And of course, that makes me want to look at why it requires an MS browser... there should be some way to do this stuff without it being a platform specific issue... -- well, with mod_asp at least, it just doesn't exist. There is a mod_asp.net, but it only runs on Win2K boxes anyways, and otherwise I have to go with a commercial product, and I'm not willing to make that case yet (although at $495 for Sun's product, it's pretty cheap... about the same as the various Perl technologies I want...)
OK -- so, I've been dumping here for a while (it's now quarter to two) -- hopefully I can go sleep?