FreePascal Information Logo Friend of FreePascal Compiler Title
Articles with Feedback, FPC News Library, PDF Collection, Mail Lists, Books, Newsgroups, IRC Open online discussion areas Research and Tutorials Tools, Compilers and Utilities Blurbs about us, advertising, etc.
Welcome to the FoFPC Research Notes: "America Competes Act"

Data Sharing and America Competes Act

by: G.E. Ozz Nixon Jr.
Published: August 2007
©opyright 2009 by Friends of FPC



     Today, August 9th 2007, President George W. Bush signed into affect the "America Creating Opporunities to Meaningfully Promote Excellence in Technology, Education, and Science Act" requiring civilian federal agencies to provide guidelines, policies and procedures, to facilitate and optimize the open exchange of data and research between agencies, the public and policymakers.

     Those of you who are familiar with my works on data replication, store-n-forward technologies, data dictionary compression, narrow and wide band satellite communications for world new broadcasts -- know this was an exciting step for those of us in the field of moving, searching and reporing data of all types.

     The Nationional Institue of Health (NIH) defines "data" as "recorded information, regardless of the form or medium on which it may be recorded, and includes writings, films, sound recordings, pictorials, drawings, procedural manuals, forms, diagrams, work-flow charts, data files, data processing, statistical records, and other research data". This in short is a realistic listing of the types of data that must be searchable - and that data is more than simple records in an SQL database engine. Which then tearsdown the concept of "Data Replication" as being a useless phrase used to market mere "SQL database copying" techniques.

     As mentioned, I have experience in narrow and wide band communications for world news... designing UPI (United Press International founded in 1907) structures and actual encoding and decoding software for companies like Planet Connect. This information is broadcasted over satellites world over in real-time, one way, out. Making searching and hashing techniques a requirement for activities happening now across the globe. I have to support the potential of 7-bit platforms, translate all images into the LCD (lowest common denominator) - pixel RGB encoding - as 7bit. Without data loss, without introducing a noticable latency.

     In recent years, I was involved in designing a database replication solution, unfortunately it was driven by narrow minded visionaries - not understanding data to contain the subset of information NIH describes as "data". Sharing data is more than posting a file of "files" on an FTP server - the data must be mined (analyzed and organized). Which is another flaw so many "data replication" experts fail to understand - like the introduction of XML - does not mean you fixed an ongoing data mapping problem - it simply means, you have introduced yet another problem to be solved (YAP2BS).

Common Problems with "data"
   < data source >
      |
   ? Type
   ? LCD (lowest common denominator)
   ? Age
   ? Priority
   ? Keywords
   ? Self-Hash
   ? Copy or Link
   ? Ownership
   ? Read-only
   ? Original Source
   ? Audit-trail

     Before data base be analyzed and organized, it must be understood - the brief listing above shows common problems with RAW data. As a developer you must think outside of the database and more at the task at hand - sharing "data". So, what Type of data is this piece of information? Image, Text, PDF, Spreadsheet, written document, etc. Based upon the answer to this question you open up data sharing to a wide range of software products and companies. For example, data replication for hospitals include all of the above formats along with prioprietary structures and even voice recordings. For this item/element to be of use, one must find a library or company who provides an API for handling data of said type. This also means, there are a lot of programs that need to be written - as data replication is just starting to get traction - yet it will be years before the data potential can be grasped by companies who pioneered this industry. Primarially due to their size and poor understanding that what they currently market is like Yahoo! was in the mid 90's in comparison to what Google is now and where they are going.

     Other topics or concerns with the "data" is finding a lowest common denominator and capitalizing on it. Like voice recognition, OCR and image recognition, all the sci-fi of 1990's to now - the LCD in human terms is text. Age and Priority come to mind working with law enforcement and health care data. Gathering or generating Keywords and Self-Hash Dictionary is extremely critical, which is another area where LCD comes in to play - you document a item/element as a RED Ford Sports car, while someone else looking for this exact item/element by enter words from a witness as RED Mustang. The common factor of RED is so low when comparing the other words.

     Then you get into ownership, copy of the data versus linked or pointer data. Is the data read-only or can it be revised, and if the revision occurs does it flow back to the originator or only as a tag. Then for security, where is the originating item/element and how did it make it to the current state in the search results? Then introduce NPI (Non-Public Information) and PCI and other Certifications where you must protect the data. Do you see how this can be exciting, especially in the current economic state? Opportunies just opened because a nations leader is starting to grasp the concept of data sharing for technology, education and sciece... or as I call it, Finances-Research-Education-and-Everything (FREE). 'Free' Data Sharing has major hurtdes, and it needs programmers who understand "together" we can all contribute to data sharing.

     This year I have worked on a project code named "Live Academia" which is a powerful search engine for Academic data. The data is actually collected from individual high school students, verified by their school counselors, searchable by both counselors and colleges. The other side of the coin, is all of the colleges share their academic schedules, demographics, scolarship opportunities, etc. Which all of the high-school students search for finding candidates to solicate. This project allows me to develop a web site which operates like a powerful desktop application, with potentially millions of records, all linking to thousands of granular piece of information. This data was not being shared prior to this project, or not on the scale as it is now.

     The largest problem with data sharing prior to this act was "data witholding". For the past 5 years I have dealt first hand with people who will gladly search the data in the search engine, but refuse to contribute their data. By this act promoting a new type of thinking; "to facilitate and optimize the open exchange of data and research between agencies" people like myself all have the opportunity to design the future of data sharing. See my other research on this topic, were I explain different search techniques, optimizing data for searches, techniques to present "did you mean ...." suggestions, how to migrate and propogate data.

G.E. Ozz Nixon Jr.
 Links and Products we find useful



ButtonGenerator.com
Valid XHTML 1.0 Transitional Internet Map
Programmer's Heaven
grat-i-fi-ca-tion - noun
the state of being gratified; great satisfaction.


"Your research documents are head on ... I look forward to seeing more notes on your research."

Brian Ellixson
FreePascal User
Locations of visitors to this page world map hits counter
Copyright 2009 by 3F, LLC. All rights reserved. Worldwide.
Your request was processed by server #3 in 0.199636 secs.

sponsor
This sponsor helps us with our documentation