Open Data Candidate Requirements and Risk Evaluation
Open Data Candidate Information |
|
Data Candidate: |
Business License Data |
Data Description: |
Business license data is comprised of data fields collected during the business license registration process. Every entity doing business within the City of Seattle limits is required to annually obtain a business license. See SMC 5.30.030 for more detailed information. |
Business Database Owner: |
Denise Movius, Revenue and Consumer Affairs |
Technical Owner: |
Vicki Childs, BT Applications Group |
Technology: |
SQL Database |
Potential Audience: |
City of Seattle Constituents |
Current Audience: |
SLIM business license data is used by a variety of groups within Revenue and Consumer affairs. There are a variety of fields in the SLIM database used for internal purposes. |
Process for Evaluating FAS Datasets 3
Dataset Evaluation Process Diagram 4
Guiding Principles Requirements 6
Guiding Principlei 1: Complete 6
Guiding Principlei 2: Primary 6
Guiding Principlei 3: Timely 8
Guiding Principlei 4: Accessible 9
Guiding Principlei 5: Machine Processable 10
Guiding Principlei 6: Non-Discriminatory 11
Guiding Principlei 7: Non-Proprietary 12
Guiding Principlei 8: License Free 13
Guiding Principle 9: Customer Service 14
Appendix A: Data Field Elements and Recommendation 18
The FAS Open Data Candidate Requirements and Risk Evaluation document is a response to a request from the City of Seattle’s Chief Information Officer, Bill Schrier to City departments to make data available on data.seattle.gov. This request has the backing of both the Mayor and Council and is part of a citywide effort to make the City of Seattle more transparent and open to our constituents. The Department of Finance and Administrative Services has developed a process for evaluating datasets against eight principles of open datai and a risk analysis profile associated with publishing the data. The risk analysis defines who the final decision maker should be who will decide whether or not to publish the dataset.
The first dataset to go through this process is the City’s business license data which is held in a SQL database and is used by the Revenue and Consumer Affairs division to collect business license data during the business license registration process. Every entity doing business within the City of Seattle limits is required to annually obtain a business license. The business license data is information that has been requested through public disclosure, it is also available on Seattle.gov in a searchable format.
A team consisting of a project manager, business data expert, technical expert, public disclosure officer, business analyst and a data.seattle.gov representative met on a weekly basis over a two month period to discuss the value of publishing business license data to data.seattle.gov. The team followed a newly developed approach where each table and field was evaluated against the eight principles of open datai. The team then assessed the risk of the data in each principle and developed a risk analysis profile to assist with the final recommendation to publish or not publish the dataset. The team spent between 10 to 15 hours in meetings to develop the requirements and discuss the risk analysis. The PM spent around 40 hours developing the template, filling out the deliverable and risk analysis and holding meetings with the Steering Committee. The development and testing of the data extract is expected to take two weeks.
In order to evaluate a FAS dataset it is recommended that the following process be used, this process follows the standard software lifecycle development process consisting of planning, analysis, design and maintenance. By following a modified SDLC approach the project team is ensuring that a complete analysis of a dataset is performed, a go/no go recommendation is made to a governing body and development and testing of a dataset occurs. The analysis stage is the emphasis for this deliverable and the design and maintenance stage will be completed once the dataset is approved for publication. Since this is a newly developed process there will be edits and revisions to the process as additional FAS datasets are analyzed and go through the process.
The following diagram outlines the process for evaluating a FAS dataset for publication:
The following roles and responsibilities should be included in every FAS data set evaluation project. This ensures that the right people are connecting together to provide a comprehensive recommendation to the Steering Committee.
Role |
Responsibility |
Project
Manager |
Responsible for leading the dataset group through the FAS dataset evaluation process. Works with the steering committee to inform them of the status, risks and/or issues during the data set evaluation process. Responsible for developing the final recommendation deliverable and risk analysis. |
Business Owner |
Sits on the steering committee. Responsible for the decision to publish or not publish data set based on dataset group’s recommendations. |
Data Owner |
Sits on the steering committee. Responsible for the decision to publish or not publish data set based on dataset group’s recommendations. |
Data Expert |
The technical representative for the dataset. Knows how the data is derived and formatted. Provides the tables and fields to the dataset group. Provides expertise for data extraction to data.seattle.gov |
Business Expert |
The business representative for the dataset. Knows the definitions for the tables and fields and provides the business knowledge around the usage of the tables and fields. |
Business Analysts |
The process and policy representative for the team. Provides the policy expertise to the team. |
Public Information Officer |
Provides the subject matter expertise for public information and is responsible for understanding the privacy risks and bringing them to the project manager for discussing with the steering committee. |
DoIT Data.Seattle.Gov representative |
Provides the dataset publishing expertise. |
Customer Service representative |
Provides the subject matter expertise for customer service and is responsible for understanding the customer service issues and bringing them to the core team and project manager for discussion with the steering committee. |
Dataset Teams:
The following table outlines the recommended teams needed for evaluating FAS datasets.
Team |
Responsibility |
Steering Committee |
This is the governing body who will evaluate the core team’s recommendation for publishing the data. The Steering Committee works with the project manager to resolve any issues brought up by the core team during the evaluation process. The Steering Committee should review any policies created by the core team and make recommendations back to the core team for any additional policies or processes needed prior to a final recommendation for publication. The Steering Committee should consist of the technology and data owners as well as the FAS director and the CTO (or representative of the CTO). |
Core Team |
This is the core working group for the dataset. This team walks through the dataset evaluation process and conducts an analysis on each table and field being considered for publication. The core team is responsible for drafting the Open Data Candidate Requirements and Risk Evaluation deliverable as well as the corresponding Data Field Elements and Recommendation and Table Purpose deliverables. The core team provides the requirements for the nine guiding principles and is responsible for developing additional policies and procedures if needed. The team also makes the draft recommendation on whether or not the dataset should be published. The core team consists of the Project Manager, data expert, business expert, business analysts, public discloser officer, customer service representative, and DOIT data.seattle.gov representative. |
Decision Maker(s) |
This is the team (or person) responsible for making the final decision to publish a FAS dataset to data.seattle.gov. This team is determined by the Risk Analysis Profile. |
Fill out each requirement for per principle.
# |
Details |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. |
Definition: All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. |
Requirements |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.1 |
Business License data fields should be made available on data.seattle.gov – completed the data field elements process (see Appendix A). The
purpose of the data tables and data fields should be clearly
documented:
Fields to be published as part of the raw data set:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.2 |
Exceptions: There were three general categories of fields to be excluded from the data set:
Specific Information Recommended for Exclusion:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 |
Issues/Action Items |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3.1 |
There are some issues around personal information being included or excluded. An overall policy is recommended for addressing personal information should be developed prior to any other further datasets being published. This dataset is recommending the exclusion of such information so no policy is needed at this time. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4. |
Existing Policies/Guidelines/Standards |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes __X__
Existing guideline established by DOIT that other agency information will not be published. |
No ____
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5. |
Risk Scale |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Risk = Medium Low = Data set to be published as is. Medium = Some work needs to be done prior to data set being published. Some fields are recommended for exclusion. High = All data fields contain some element of risk or too much data clean up needed prior to publishing data. |
Risk
Rational: |
# |
Details |
|
1. |
Definition: Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
|
|
2. |
Requirements |
|
2.1 |
Business license data should be at the detail level: Business license data is collected at the detail level.
|
|
2.2 |
Exceptions: None
|
|
3 |
Issues/Action Items |
|
3.1 |
None
|
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Low Low – data is granular, not in aggregate or modified forms Medium – some data is in summary and would have to be broken down into detail information High – Only summary data is available |
Risk Rational: Business License data is granular. |
# |
Details |
|
1. |
Definition: Data is made available as quickly as necessary to preserve the value of the data. |
|
2. |
Requirements |
|
2.1 |
Business
license data should be made available on a consistently defined
timeline. Data set should include an as-of-date. Timeline defined for business license data is monthly and will be uploaded to data.seattle.gov on the first weekend of the month.
|
|
2.2 |
Exceptions:
|
|
3 |
Issues/Action Items |
|
3.1 |
General questions to ask for the availability of data in a timely manner: Are transactions occurring daily? Minimal updates of transactions occur daily. Bigger updates occur annually when business licenses are up for renewal. What do the constituents want? TBD Is there an impact to the process or the production cycle? No impact to production but there is an impact to batch processing as this will be a new request for a batch extract to populate data.seattle.gov There is a maintenance issue with two monthly extracts – the file extracts will need to be created and maintained in a batch job. |
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No __
|
5. |
Risk Scale |
|
|
Risk = Medium Low - data is easy to extract Medium – data is more difficult to extract and a scheduled data extract is needed. No or minimal impact to production cycles for application. High – data will be difficult to extract and impacts production cycle for application. |
Risk Rational: This is labeled as medium risk because there will need to be two batch jobs created for publishing the data set in a complete format. |
# |
Details |
|
1. |
Definition: Data is available to the widest range of users for the widest range of purposes. |
|
2. |
Requirements |
|
2.1 |
There should not be any barriers on constituent’s ability to access the business license data. However, the minimum requirements for downloading data should be given to constituents.
|
|
2.2 |
Exceptions: Team is recommending two data sets be created since the data set as-is is too large. Active Business License data set and Non-Active Business License data set.
|
|
3 |
Issues/Action Items
|
|
3.1 |
Size of the data sets – do we break up data sets for usability? Socrata also will provide filtering mechanism for users to pull down specific sets of data. |
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Medium Low – No barriers to data set and low risk to data set usage by public Medium – No barriers to data set and some risk to data set usage by the public High – Barriers to data and high risk to data set usage |
Risk Rational: Size of data is an issue and will be a barrier to some constituent’s use of the data. |
# |
Details |
|
1. |
Definition: Data is reasonably structured to allow for automated processing
|
|
2. |
Requirements |
|
2.1 |
Socrata will publish data in machine readable formats.
|
|
2.2 |
Exceptions:
|
|
3 |
Issues/Action Items |
|
3.1 |
|
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Low Low – Data set structure allows for automated processing Medium – Not all of the data set allows for automated processing (i.e., pdf’s part of data set) High – All of the data set is not set up for automated processing (i.e., blanket contract documents) |
Risk
Rational: |
# |
Details |
|
1. |
Definition: Data is available to anyone, with no requirement of registration
|
|
2. |
Requirements |
|
2.1 |
City of Seattle’s data.seattle.gov site does require registration.
|
|
2.2 |
Exceptions: The recommendation is to not publish mailing addresses or regulatory license information and therefore there is no need for any extra registration or policies when publishing the data.
|
|
3 |
Issues/Action Items |
|
3.1 |
If the Steering Committee decides to publish the mailing address or personal license holder information then address and privacy policies will need to be developed and added to the metadata prior to publication of data. |
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Low Low – General data.seattle.gov registration is required Medium – Nature of data is sensitive enough to require additional registration to ensure privacy High – Data set is too sensitive and would require extensive information about the users of the data. |
Risk Rational: If Steering Committee goes with recommended exclusions of data then the risk is low. |
# |
Details |
|
1. |
Definition: Data is available in a format over which no entity has exclusive control
|
|
2. |
Requirements |
|
2.1 |
|
|
2.2 |
Exceptions:
|
|
3 |
Issues/Action Items |
|
3.1 |
Do we need to notify business license holders that we are publishing this data set? Not in this specific dataset because the same information is currently available on seattle.gov.
For any other FAS datasets the group is recommending a policy be created around publishing private information and notifying the affected constituents prior to publication. |
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Low Low – Data set available in non-proprietary format Medium – Some of the data contained in data set is not available in non-proprietary formats High – All of the data is not available in non-proprietary format
|
Risk Rational: The business license data is available in a non-proprietary format and is already published on seattle.gov without prior notification to registered business license users. |
# |
Details |
|
1. |
Definition: Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restriction may be allowed.
|
|
2. |
Requirements |
|
2.1 |
No copyright, patent, trademark or trade secret regulation on this data set.
There are some reasonable privacy issues which are being dealt with through exclusion of mailing addresses and regulatory information.
|
|
2.2 |
Exceptions:
|
|
3 |
Issues/Action Items |
|
3.1 |
The recommendation is to not publish mailing addresses or regulatory information. If this recommendation is followed then the data extract will need to include additional criteria to filter out this information. |
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Medium Low – Data set completely open, can be published as is. Medium – Massage of data will occur to publish data set and questionable for reasonable privacy, security and privilege restrictions High – Data set contains too much sensitive data to be published and reasonable privacy, security and privilege restrictions do not exist. |
Risk Rational: Additional criteria to be added to data extract to withhold certain information. |
# |
Details |
|
1. |
Definition: A contact person must be designated to respond to constituents trying to use the data.
A contact person must be designated to respond to constituent complaints about violations of privacy or violations of the principles
An administrative or judicial court must have the jurisdiction to review whether or not FAS has applied these principles appropriately. |
|
2. |
Requirements |
|
2.1 |
The first point of contact for Business License data is Revenue and Consumer Protection. The division contact information will be published on the metadata page. If there is something that is outside of RCA’s responsibility, then RCA will pass it on to a FAS Public Information Office or Public Disclosure representative.
All technical issues or questions will be handled by Socrata.
|
|
2.2 |
Exceptions:
|
|
3 |
Issues/Action Items |
|
3.1 |
Is there a web form for constituents to fill out on the data.seattle.gov site to get questions answered? No, but DOIT does provide a metadata form which will be filled out and show RCA as the contact.
|
|
4. |
Existing Policies/Guidelines/Standards |
|
|
Yes ____
|
No ____
|
5. |
Risk Scale |
|
|
Risk = Low Low – Data set contains no sensitive information and will require minimum customer service care Medium – Data set contains some sensitive information but can be handled internally if issues arise High – Data set being published can be seen as highly sensitive, will require special care when published and DFAS needs to notify Executives and Law to review the guiding principles to ensure no legal issues. |
Risk Rational: If Steering Committee accepts the recommended additional criteria to screen out mailing addresses and personal license holder information then the data set is not sensitive and will require minimum customer service care. |
Include a brief description of the risk associated to this data set and the final risk profile chart.
The risk analysis profile shows the publication of the Business License data to be a medium risk. This is primarily due to the higher risks in the Complete, Timely, Accessible and License Free guidelines. The team’s decision to exclude regulatory license information and break the dataset into two datasets adds to the risks.
Data team recommendation to the Steering Committee for moving forward or not moving forward with publishing data set on data.seattle.gov
The Business License data team recommends publishing the business license data with the exception of mailing address and regulatory information. Regulatory information that will be excluded is the mailing address and the data fields from the Person Table (see Appendix A). The team has reviewed the tables and fields in the SLIM database and recommends a monthly extract be developed to output the data to data.seattle.gov in two extracts. The first extract will be all current open business license holders and the second will be the historical data. Each of the extracts will be refreshed on a monthly basis.
The total cost of the initial project to develop a process and use the business license data as the first data set was around $12,000 which consists of City labor costs totaling 242 hours from concept to completion and publication of the data to data.seattle.gov.
We, the undersigned decision makers, have reviewed this document and approve of the Go/No Go Dataset Publishing Recommendations and the deliverable:
Executive Sponsors: |
|
Fred Podesta, Dept. of Finance and Administrative Services (FAS) Director |
684-3181 |
Signature: |
Date: |
Bill Schrier, Chief Technology Officer, Dept. Of Information Services |
684-0633 |
Signature: |
Date: |
Bryon Tokunaga, Business Technology Director, FAS |
684-0543 |
Signature: |
Date: |
Denise Movius, Revenue and Consumer Protection Director, FAS |
684-9259 |
Signature: |
Date: |
I, the undersigned decision maker, have reviewed this document and the risk analysis and approve of the Go/No Go Dataset publishing recommendation and agree to publish the data on data.seattle.gov:
Executive Decision Maker: |
|
Fred Podesta, Dept. of Finance and Administrative Services (FAS) Director |
684-3181 |
Signature: |
Date: |
Version # |
Revised Date |
Description |
1.8 |
04/29/2010 |
Final document distributed to Steering Committee |
1.9 |
05/03/2010 |
Incorporated final comments from Steering Committee and added a new section showing total project costs and clarified the exclusion of mailing address and regulatory information from the Person Table in SLIM. |
|
|
|
|
|
|
Shows all the tables and fields associated to the dataset and provides decision makers a recommendation of whether or not fields should be included in the published data set.
Appendix
B: SLIM Business License Table Purposes
Describes
the purpose of each table used for publishing in the raw data set.
Include if there is a go decision. Show a high level schedule for publishing the data set, define who the data set contacts are, and decided publishing timeline (weekly, monthly, etc).
i (Public.Resource.Org, 2007)