Open Data Candidate Requirements and Risk Evaluation

Open Data Candidate Information

Data Candidate:

Business License Data

Data Description:

Business license data is comprised of data fields collected during the business license registration process. Every entity doing business within the City of Seattle limits is required to annually obtain a business license. See SMC 5.30.030 for more detailed information.

Business Database Owner:

Denise Movius, Revenue and Consumer Affairs

Technical Owner:

Vicki Childs, BT Applications Group

Technology:

SQL Database

Potential Audience:

City of Seattle Constituents

Current Audience:

SLIM business license data is used by a variety of groups within Revenue and Consumer affairs. There are a variety of fields in the SLIM database used for internal purposes.







Table of Contents

Background 3

Process for Evaluating FAS Datasets 3

Dataset Evaluation Process Diagram 4

Roles and Responsibilities 4

Guiding Principles Requirements 6

Guiding Principlei 1: Complete 6

Guiding Principlei 2: Primary 6

Guiding Principlei 3: Timely 8

Guiding Principlei 4: Accessible 9

Guiding Principlei 5: Machine Processable 10

Guiding Principlei 6: Non-Discriminatory 11

Guiding Principlei 7: Non-Proprietary 12

Guiding Principlei 8: License Free 13

Guiding Principle 9: Customer Service 14

Risk Assessment 14

Go/No Go Decision 15

Total Cost of Project 15

Acceptance: 16

Data Publishing Agreement: 17

Document Revisions 18

Appendix A: Data Field Elements and Recommendation 18

Appendix C: Data Publishing and Maintenance Plan 18

Appendix D: MetaData Form 18

18



Background

The FAS Open Data Candidate Requirements and Risk Evaluation document is a response to a request from the City of Seattle’s Chief Information Officer, Bill Schrier to City departments to make data available on data.seattle.gov. This request has the backing of both the Mayor and Council and is part of a citywide effort to make the City of Seattle more transparent and open to our constituents. The Department of Finance and Administrative Services has developed a process for evaluating datasets against eight principles of open datai and a risk analysis profile associated with publishing the data. The risk analysis defines who the final decision maker should be who will decide whether or not to publish the dataset.


The first dataset to go through this process is the City’s business license data which is held in a SQL database and is used by the Revenue and Consumer Affairs division to collect business license data during the business license registration process. Every entity doing business within the City of Seattle limits is required to annually obtain a business license. The business license data is information that has been requested through public disclosure, it is also available on Seattle.gov in a searchable format.


A team consisting of a project manager, business data expert, technical expert, public disclosure officer, business analyst and a data.seattle.gov representative met on a weekly basis over a two month period to discuss the value of publishing business license data to data.seattle.gov. The team followed a newly developed approach where each table and field was evaluated against the eight principles of open datai. The team then assessed the risk of the data in each principle and developed a risk analysis profile to assist with the final recommendation to publish or not publish the dataset. The team spent between 10 to 15 hours in meetings to develop the requirements and discuss the risk analysis. The PM spent around 40 hours developing the template, filling out the deliverable and risk analysis and holding meetings with the Steering Committee. The development and testing of the data extract is expected to take two weeks.

Process for Evaluating FAS Datasets

In order to evaluate a FAS dataset it is recommended that the following process be used, this process follows the standard software lifecycle development process consisting of planning, analysis, design and maintenance. By following a modified SDLC approach the project team is ensuring that a complete analysis of a dataset is performed, a go/no go recommendation is made to a governing body and development and testing of a dataset occurs. The analysis stage is the emphasis for this deliverable and the design and maintenance stage will be completed once the dataset is approved for publication. Since this is a newly developed process there will be edits and revisions to the process as additional FAS datasets are analyzed and go through the process.


The following diagram outlines the process for evaluating a FAS dataset for publication:



Dataset Evaluation Process Diagram





Roles and Responsibilities

The following roles and responsibilities should be included in every FAS data set evaluation project. This ensures that the right people are connecting together to provide a comprehensive recommendation to the Steering Committee.


Role

Responsibility

Project Manager
(or Dataset Coordinator)

Responsible for leading the dataset group through the FAS dataset evaluation process. Works with the steering committee to inform them of the status, risks and/or issues during the data set evaluation process. Responsible for developing the final recommendation deliverable and risk analysis.

Business Owner

Sits on the steering committee. Responsible for the decision to publish or not publish data set based on dataset group’s recommendations.

Data Owner

Sits on the steering committee. Responsible for the decision to publish or not publish data set based on dataset group’s recommendations.

Data Expert

The technical representative for the dataset. Knows how the data is derived and formatted. Provides the tables and fields to the dataset group. Provides expertise for data extraction to data.seattle.gov

Business Expert

The business representative for the dataset. Knows the definitions for the tables and fields and provides the business knowledge around the usage of the tables and fields.

Business Analysts

The process and policy representative for the team. Provides the policy expertise to the team.

Public Information Officer

Provides the subject matter expertise for public information and is responsible for understanding the privacy risks and bringing them to the project manager for discussing with the steering committee.

DoIT Data.Seattle.Gov representative

Provides the dataset publishing expertise.

Customer Service representative

Provides the subject matter expertise for customer service and is responsible for understanding the customer service issues and bringing them to the core team and project manager for discussion with the steering committee.



Dataset Teams:

The following table outlines the recommended teams needed for evaluating FAS datasets.

Team

Responsibility

Steering Committee

This is the governing body who will evaluate the core team’s recommendation for publishing the data. The Steering Committee works with the project manager to resolve any issues brought up by the core team during the evaluation process. The Steering Committee should review any policies created by the core team and make recommendations back to the core team for any additional policies or processes needed prior to a final recommendation for publication. The Steering Committee should consist of the technology and data owners as well as the FAS director and the CTO (or representative of the CTO).

Core Team

This is the core working group for the dataset. This team walks through the dataset evaluation process and conducts an analysis on each table and field being considered for publication. The core team is responsible for drafting the Open Data Candidate Requirements and Risk Evaluation deliverable as well as the corresponding Data Field Elements and Recommendation and Table Purpose deliverables. The core team provides the requirements for the nine guiding principles and is responsible for developing additional policies and procedures if needed. The team also makes the draft recommendation on whether or not the dataset should be published. The core team consists of the Project Manager, data expert, business expert, business analysts, public discloser officer, customer service representative, and DOIT data.seattle.gov representative.

Decision Maker(s)

This is the team (or person) responsible for making the final decision to publish a FAS dataset to data.seattle.gov. This team is determined by the Risk Analysis Profile.



Guiding Principles Requirements

Fill out each requirement for per principle.

Guiding Principlei 1: Complete


#

Details

1.

Definition:

All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

2.

Requirements

2.1

Business License data fields should be made available on data.seattle.gov – completed the data field elements process (see Appendix A).

The purpose of the data tables and data fields should be clearly documented:
See Appendix A and B


Fields to be published as part of the raw data set:


Display Order

Field

Description

1

business_legal_name


Legal Name

2

trade_name


DBA of the business

3

ownership_type_id


Type of Business – SP, PT, LLC, etc.

4

business_id


Customer Number – business license number.

5

lic_start_date

Date business was opened

6

naics_code


Code identifying the type of business

7

SIC_code_id


Code identifying the type of business

8

SIC_description

Description of a SIC code

9

lic_type_exp_date


Date the license expires (i.e., a business license expires on 12/31)

10

license_type_code

A two character code for each license type


11

license_description


Description of the license (i.e., business license, taxi license, amusement device license)

12

renewal_date

Due Date of each license renewal

13

tax_rptg_cd_id


If business files taxes qtrly, annual, monthly

14

house_number


15

house_suffix


16

street_address


17

street_prefix


18

phone_number

The area code plus a seven digit local number

18

street_name


19

street_suffix


20

street_type


21

unit_number


22

city_state_zip


23

country_id


24

internet_address_nbr


Business website

25

bus_loc_id


If more than one location (different branches)– SLIM assigns additional ids

26

close_date


Close Date of the Location

27

actual_license_fee_charged

The "actual" fee that was charged for a license

28

utility_only

Indicates that business reports utility only

29

gambling_only

Gambling only account.

30

vendor_nbr

If they have a City of Seattle vendor number

31

bankruptcy_date


Date business went bankrupt

31

half_yr_cutoff_date


Date beyond which a 50% reduction fee can be applied to a new license

32

bus_action_code_id

Tied to Enforcement Module – Citations, etc.

32

late_fee


If there is a "late fee" charge that can be associated with a license (i.e., personal and regulatory licenses can have late fees applied)

33

hold_on_renewal_flag


Tells the batch renewal process whether to re-open the required holds at license renewal. Some types need a fire/health check every year, for example.


34

first_name


35

last_name


36

middle_initial


37

bus_close_date


Date business closed

38

business_status_id


Identify if business is closed, open, revoked, etc

39

lic_end_date

Date Business was closed

40

license_status_id

Status of the license – this field will be used to break down the extract into two separate extracts.


2.2

Exceptions:

There were three general categories of fields to be excluded from the data set:

  1. Internal use only fields

  2. System generated fields

  3. Personal data fields

Specific Information Recommended for Exclusion:

  1. Mailing Address – privacy concern for use in mass mailings. Only publishing the physical address which is based on place of business location and the basic contact information contained on the business license.

  2. Regulatory Personal Information – privacy concern around personal information being released. A regulatory license holder also has business license information which will be published but the team is recommending not publishing the regulatory information contained in the Person Table.

3

Issues/Action Items

3.1

There are some issues around personal information being included or excluded. An overall policy is recommended for addressing personal information should be developed prior to any other further datasets being published. This dataset is recommending the exclusion of such information so no policy is needed at this time.

4.

Existing Policies/Guidelines/Standards


Yes __X__


Existing guideline established by DOIT that other agency information will not be published.

No ____


5.

Risk Scale


Risk = Medium

Low = Data set to be published as is.

Medium = Some work needs to be done prior to data set being published. Some fields are recommended for exclusion.

High = All data fields contain some element of risk or too much data clean up needed prior to publishing data.

Risk Rational:
There are enough excluded data fields to categorize the risk as medium.


Guiding Principlei 2: Primary


#

Details

1.

Definition:

Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.


2.

Requirements

2.1

Business license data should be at the detail level: Business license data is collected at the detail level.


2.2

Exceptions:

None



3

Issues/Action Items

3.1

None


4.

Existing Policies/Guidelines/Standards


Yes ____



No ____


5.

Risk Scale


Risk = Low

Low – data is granular, not in aggregate or modified forms

Medium – some data is in summary and would have to be broken down into detail information

High – Only summary data is available

Risk Rational: Business License data is granular.




Guiding Principlei 3: Timely


#

Details

1.

Definition:

Data is made available as quickly as necessary to preserve the value of the data.

2.

Requirements

2.1

Business license data should be made available on a consistently defined timeline. Data set should include an as-of-date.

Timeline defined for business license data is monthly and will be uploaded to data.seattle.gov on the first weekend of the month.




2.2

Exceptions:



3

Issues/Action Items

3.1

General questions to ask for the availability of data in a timely manner:

Are transactions occurring daily? Minimal updates of transactions occur daily. Bigger updates occur annually when business licenses are up for renewal.

What do the constituents want? TBD

Is there an impact to the process or the production cycle? No impact to production but there is an impact to batch processing as this will be a new request for a batch extract to populate data.seattle.gov There is a maintenance issue with two monthly extracts – the file extracts will need to be created and maintained in a batch job.

4.

Existing Policies/Guidelines/Standards


Yes ____



No __



5.

Risk Scale


Risk = Medium

Low - data is easy to extract

Medium – data is more difficult to extract and a scheduled data extract is needed. No or minimal impact to production cycles for application.

High – data will be difficult to extract and impacts production cycle for application.

Risk Rational:

This is labeled as medium risk because there will need to be two batch jobs created for publishing the data set in a complete format.




Guiding Principlei 4: Accessible


#

Details

1.

Definition:

Data is available to the widest range of users for the widest range of purposes.

2.

Requirements

2.1

There should not be any barriers on constituent’s ability to access the business license data. However, the minimum requirements for downloading data should be given to constituents.




2.2

Exceptions:

Team is recommending two data sets be created since the data set as-is is too large. Active Business License data set and Non-Active Business License data set.



3

Issues/Action Items

3.1

Size of the data sets – do we break up data sets for usability?

Socrata also will provide filtering mechanism for users to pull down specific sets of data.

4.

Existing Policies/Guidelines/Standards


Yes ____



No ____



5.

Risk Scale


Risk = Medium

Low – No barriers to data set and low risk to data set usage by public

Medium – No barriers to data set and some risk to data set usage by the public

High – Barriers to data and high risk to data set usage

Risk Rational:

Size of data is an issue and will be a barrier to some constituent’s use of the data.




Guiding Principlei 5: Machine Processable


#

Details

1.

Definition: Data is reasonably structured to allow for automated processing


2.

Requirements

2.1

Socrata will publish data in machine readable formats.



2.2

Exceptions:



3

Issues/Action Items

3.1


4.

Existing Policies/Guidelines/Standards


Yes ____



No ____



5.

Risk Scale


Risk = Low

Low – Data set structure allows for automated processing

Medium – Not all of the data set allows for automated processing (i.e., pdf’s part of data set)

High – All of the data set is not set up for automated processing (i.e., blanket contract documents)

Risk Rational:
Handled by Socrata




Guiding Principlei 6: Non-Discriminatory


#

Details

1.

Definition: Data is available to anyone, with no requirement of registration


2.

Requirements

2.1

City of Seattle’s data.seattle.gov site does require registration.


2.2

Exceptions:

The recommendation is to not publish mailing addresses or regulatory license information and therefore there is no need for any extra registration or policies when publishing the data.


3

Issues/Action Items

3.1

If the Steering Committee decides to publish the mailing address or personal license holder information then address and privacy policies will need to be developed and added to the metadata prior to publication of data.

4.

Existing Policies/Guidelines/Standards


Yes ____



No ____



5.

Risk Scale


Risk = Low

Low – General data.seattle.gov registration is required

Medium – Nature of data is sensitive enough to require additional registration to ensure privacy

High – Data set is too sensitive and would require extensive information about the users of the data.

Risk Rational: If Steering Committee goes with recommended exclusions of data then the risk is low.


Guiding Principlei 7: Non-Proprietary


#

Details

1.

Definition: Data is available in a format over which no entity has exclusive control


2.

Requirements

2.1




2.2

Exceptions:



3

Issues/Action Items

3.1

Do we need to notify business license holders that we are publishing this data set? Not in this specific dataset because the same information is currently available on seattle.gov.


For any other FAS datasets the group is recommending a policy be created around publishing private information and notifying the affected constituents prior to publication.

4.

Existing Policies/Guidelines/Standards


Yes ____



No ____


5.

Risk Scale


Risk = Low

Low – Data set available in non-proprietary format

Medium – Some of the data contained in data set is not available in non-proprietary formats

High – All of the data is not available in non-proprietary format


Risk Rational: The business license data is available in a non-proprietary format and is already published on seattle.gov without prior notification to registered business license users.



Guiding Principlei 8: License Free


#

Details

1.

Definition: Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restriction may be allowed.


2.

Requirements

2.1

No copyright, patent, trademark or trade secret regulation on this data set.


There are some reasonable privacy issues which are being dealt with through exclusion of mailing addresses and regulatory information.



2.2

Exceptions:



3

Issues/Action Items

3.1

The recommendation is to not publish mailing addresses or regulatory information. If this recommendation is followed then the data extract will need to include additional criteria to filter out this information.

4.

Existing Policies/Guidelines/Standards


Yes ____



No ____



5.

Risk Scale


Risk = Medium

Low – Data set completely open, can be published as is.

Medium – Massage of data will occur to publish data set and questionable for reasonable privacy, security and privilege restrictions

High – Data set contains too much sensitive data to be published and reasonable privacy, security and privilege restrictions do not exist.

Risk Rational:

Additional criteria to be added to data extract to withhold certain information.


Guiding Principle 9: Customer Service


#

Details

1.

Definition:

A contact person must be designated to respond to constituents trying to use the data.


A contact person must be designated to respond to constituent complaints about violations of privacy or violations of the principles


An administrative or judicial court must have the jurisdiction to review whether or not FAS has applied these principles appropriately.

2.

Requirements

2.1

The first point of contact for Business License data is Revenue and Consumer Protection. The division contact information will be published on the metadata page. If there is something that is outside of RCA’s responsibility, then RCA will pass it on to a FAS Public Information Office or Public Disclosure representative.


All technical issues or questions will be handled by Socrata.


2.2

Exceptions:



3

Issues/Action Items

3.1

Is there a web form for constituents to fill out on the data.seattle.gov site to get questions answered? No, but DOIT does provide a metadata form which will be filled out and show RCA as the contact.


4.

Existing Policies/Guidelines/Standards


Yes ____



No ____



5.

Risk Scale


Risk = Low

Low – Data set contains no sensitive information and will require minimum customer service care

Medium – Data set contains some sensitive information but can be handled internally if issues arise

High – Data set being published can be seen as highly sensitive, will require special care when published and DFAS needs to notify Executives and Law to review the guiding principles to ensure no legal issues.

Risk Rational: If Steering Committee accepts the recommended additional criteria to screen out mailing addresses and personal license holder information then the data set is not sensitive and will require minimum customer service care.

Risk Assessment

Include a brief description of the risk associated to this data set and the final risk profile chart.


The risk analysis profile shows the publication of the Business License data to be a medium risk. This is primarily due to the higher risks in the Complete, Timely, Accessible and License Free guidelines. The team’s decision to exclude regulatory license information and break the dataset into two datasets adds to the risks.






Go/No Go Decision

Data team recommendation to the Steering Committee for moving forward or not moving forward with publishing data set on data.seattle.gov


The Business License data team recommends publishing the business license data with the exception of mailing address and regulatory information. Regulatory information that will be excluded is the mailing address and the data fields from the Person Table (see Appendix A). The team has reviewed the tables and fields in the SLIM database and recommends a monthly extract be developed to output the data to data.seattle.gov in two extracts. The first extract will be all current open business license holders and the second will be the historical data. Each of the extracts will be refreshed on a monthly basis.



Total Cost of Project

The total cost of the initial project to develop a process and use the business license data as the first data set was around $12,000 which consists of City labor costs totaling 242 hours from concept to completion and publication of the data to data.seattle.gov.

Acceptance:


We, the undersigned decision makers, have reviewed this document and approve of the Go/No Go Dataset Publishing Recommendations and the deliverable:


Executive Sponsors:

Fred Podesta, Dept. of Finance and Administrative Services (FAS) Director

684-3181

Signature:

Date:

Bill Schrier, Chief Technology Officer, Dept. Of Information Services

684-0633

Signature:

Date:

Bryon Tokunaga, Business Technology Director, FAS

684-0543

Signature:

Date:

Denise Movius, Revenue and Consumer Protection Director, FAS

684-9259

Signature:

Date:







Data Publishing Agreement:


I, the undersigned decision maker, have reviewed this document and the risk analysis and approve of the Go/No Go Dataset publishing recommendation and agree to publish the data on data.seattle.gov:



Executive Decision Maker:

Fred Podesta, Dept. of Finance and Administrative Services (FAS) Director

684-3181

Signature:

Date:







Document Revisions


Version #

Revised Date

Description

1.8

04/29/2010

Final document distributed to Steering Committee

1.9

05/03/2010

Incorporated final comments from Steering Committee and added a new section showing total project costs and clarified the exclusion of mailing address and regulatory information from the Person Table in SLIM.








Appendix A: Data Field Elements and Recommendation

Shows all the tables and fields associated to the dataset and provides decision makers a recommendation of whether or not fields should be included in the published data set.


Appendix B: SLIM Business License Table Purposes
Describes the purpose of each table used for publishing in the raw data set.

Appendix C: Data Publishing and Maintenance Plan

Include if there is a go decision. Show a high level schedule for publishing the data set, define who the data set contacts are, and decided publishing timeline (weekly, monthly, etc).


Appendix D: MetaData Form

i



i (Public.Resource.Org, 2007)