Opening Data in Chengdu For All

My name is Bruce. I recently moved to Chengdu, and I am optimistic about open data and civic technology in China.

houstonFor the last five years I worked for the City of Houston, and among the many projects I worked on was the launching of Houston’s open data and open innovation hackathon initiatives. I worked with really awesome people and had transformative leaders in Mayor Annise Parker, Council Member Ed Gonzalez, and CFO Kelly Dowe. We took an experimental approach, and not everything worked perfectly, but we were able to empower people to make a difference by extending access to data.

I now live in Chengdu, having moved here with my wife in December, and I love the city already. The food is delicious (???????), the people are great, and I love the community. Since I’ve been here, I have made it my mission to learn about open data and civic innovation in China.

Moving here I wondered if I would be able to continue my work on open data. Would the government be interested? Would it be okay? Some people told me that I shouldn’t even mention open data or civic engagement once we got here. Luckily, there were just as many others who said otherwise, and since arriving I’ve learned that open data and civic engagement are alive and growing.

Introduction to Open Data

Open DataFor the uninitiated, “open data” and “civic innovation” describe a type of public initiative, often government or non-profit led, to collect useful datasets and distribute them online in machine-readable formats. The goal of these projects is to fuel entrepreneurs and researchers by increasing data accessibility, and improve public institutions by increasing transparency and civic participation.

When data is made available in a centralized location in a readily usable format, businesses, citizens, and researchers can use that data to create value-added services, applications, and analyses. Developments in GPS location services are just one example of the success of governments opening data to create services and value.

Chinese Characteristics

Naturally, open data and civic engagement initiatives are new to China and come with Chinese characteristics, but that’s no different than the newness of open data in Houston with its Texas characteristics. As in the West, the idea of connecting citizens and government in order to improve public services is very real in China. I’m optimistic about the prospects for open data because I’ve talked to many people here that recognize the social benefits and I see progress coming from civic technologists and the government.

Maybe I am optimistic because I see many of the same opportunities that I saw in the US, albeit on a larger scale. I also see the challenges, and they are likewise familiar. Below are some things I’ve seen written about why “China and open data can never get along with one another.” If you know the story of open data in the West, you’ll recognize some themes:

  • “The government doesn’t put any real effort into open data”
  • “The government data is not trustworthy or accurate”
  • “The government doesn’t release information that could make it look bad”
  • “The data isn’t available, isn’t in one place, isn’t defined, and isn’t machine readable”
  • “The government subject matter experts cannot be found. No one can answer questions”
  • “The government doesn’t like the term ‘hackers’, which has a negative connotation”

All these comments resonate with my experience in Houston. I’ve heard them for years from my peers in government and civic tech circles in the US as well. The West may be ahead in terms of open data and civic tech, but maybe not that far ahead. On all sides there is a lot of work to be done.

What’s Happening in China

For starters, as Andy Liu commented in the talk Exploring Open Data in China, maybe China already has open data, but it goes by a different name. Perhaps there IS open data in China, it’s just scattered and hard to find.

The Sichuan Province Fact Book is a good example. It has enormous amounts of useful commercial information, but it is not in a machine-readable format. As a book it’s not particularly accessible, but it’s not hidden. There is similar statistical information published online (in places like wenku.baidu.com) but most of it is in Chinese, making it difficult for Westerners to access.

China’s Data Portals

China MapThe Chinese National Government, along with Provincial and Municipal governments are launching data portals. Two national sites I found are the National Bureau of Statistics and Public Information Online. They’re not easy for a Westerner to navigate, but it’s a start. There are also city data portals. Beijing Data, Data Shanghai, and Hong Kong’s Open Gov Project are the standouts I found. Again, not perfect, but a good start.

China’s Smart City Open Data Platform is also very interesting. The government recently rolled out their Love City Platform in a handful of municipalities, including Qingdao, and are planning implementation in ~30 more cities. That’s a BIG initiative, and if it can be done successfully across jurisdictions would be a significant achievement.

These activities are supported by conversations I’ve had with Chinese government delegations in the US who were looking to understand open data and its ties to innovation. I’m pretty bullish on Chinese open data, similar to the Open Knowledge Foundation’s Feng Gao who has is a leader advocating for and tracking open data in China. Like Gao, I see iterative progress that can be built upon, and I see the same powerful arguments that convinced my government colleagues in the US also having traction in China. Open data promotes economic productivity and innovation, and goes far beyond transparency.

Community Interest Leads to Data

IMG_3700
Cui Anyong: Hacks/Hackers Beijing co-organizer. Co-founder of djchina.org. Open Knowledge Foundation China Ambassador.

Open data alone is not enough however. There must be civic and business interest in the data, and there must be mutual value for the government/non-profit and external parties. In China there is certainly external interest. As reported by Rebecca Chao in The Hunt for Open Data in China, open data and civic tech was initially led by the community. Community members, businesses and developers are requesting the data they need. When they don’t get it, they’re finding innovative ways to obtain the data, like Cui Anyong scraping JPEG’s to get water quality data (kudos).

I believe open data is also generating goodwill. Bu Shujian learned that available Chinese government data was not necessarily inaccurate or fake. As Chao reported, she was interested in comparing air quality reporting by the Chinese and US Governments, and was skeptical of the data from the Chinese. Interestingly, she found the differences were due to different calculations and the Chinese Government’s use of more data points from across the city (which could mean their reporting was more accurate).

China Hackathons

HackathonWhat’s more, there are hackathons in China too! The Chinese programing community recognizes the potential value of coordinated communal effort, and people are rallying together over weekends on projects just like in the West. Derived from a combination of the words “hack” and “marathon”, “hackathons” are events where software developers, designers, and business people come together to create technological solutions to solve a specific problem or advance a specific cause. The Hackathon movement has been expanding in the US in recent years and I was excited to hear that it catching on in China as well. At one of my first dinners in China, a friend in Beijing told me about the Cleanweb Hackathon they hosted just months earlier.

Finally, momentum continues to build in China as individuals passionate for open data innovation are seeing progress and also connecting and sharing ideas. The launch of the Open Data China network during International Open Data Day is a perfect example of this.

Open Data in Chengdu

I believe in the missions of the Code for America and Open Houston projects that I worked on while in the US. We engaged government with citizens and businesses to improve services, budgeting, and quality of life. People and organizations were empowered to work together to build tools that could enable better lives. I believe open data and civic engagement are also of great value here in China – and many people are clearly embracing these opportunities already.

I think there are many potential benefits for Chengdu. We can use open data to tell the exciting story of our city and to promote economic activity and investment. It makes business sense to locate a new business in a city with a wealth of information available about it. Open data supports the growth of local entrepreneurs and creates opportunities for new tech companies.

Moreover, open data can be used by creative individuals and organizations that can create applications that enhance quality of life. As one of China’s largest cities with a booming technology community, I know we have lots of software developers and designers who would love to spend some time on “civic hacking” to build apps that improve the city.

Great progress is being made in China, and I hope to work with others here to bring open data to Chengdu. In the meantime, I’m learning a new language … and it’s not one for computer programming! ??!

This story was originally posted to the Code for America blog. The content has been modified to provide additional context and further definition of open data and civic technology for Chengdu Living.

Bruce was also interviewed last year by Code for America about his involvement in Open Data in Texas last summer. It’s a good read, check it out here: Spotlight: Bruce Haupt

23 thoughts on “Opening Data in Chengdu For All”

  1. Interesting topic and post – we’re lucky to have someone with your skill set join us in Chengdu and embrace this cause which has such obvious benefit to everyone here.

    I was surprised to see you mention the data portals already existing in China. The only data metric that I watch on a regular basis is air pollution and temperature, but there are so many more beneficial ones as you mention. Access to updated transportation information like train and bus lines would be really convenient.

    Having more access to data would benefit everyone in the city, especially since so much data is obscured to us for language and practical reasons. I have high hopes for this.

    By the way, someone created a forum post just today about Open Data in China, which seems to be coincidental. Here’s a link: Open Data in Chengdu

    Reply
  2. 12 months from now: bitter expat launches racist tirade against Chinese government for not being like the one he’s used to at home. Can’t understand why they don’t see the wisdom of highly educated whites. Still uses Chinese characters in the middle of his English for no reason.

    Reply
    • I assume you’re talking about controversial data points that China may have a reason to hide, but I believe the “not possible in China” argument was well addressed in this article.

      Most of the data we’re talking about is not controversial and serves to benefit both the government and residents of Chengdu. Things like updated GPS information, bus routes, availability of fiber optic internet, etc. Much of this information is already publicly available (like bus routes) but it’s just not published anywhere online in a usable format.

      But to ask you a question: what do you make of the Chinese government publishing regularly updated air pollution information? I think there is no doubt that the demand for information is rising and that government can not only not suppress it, but will play an increasingly large role in making it available to everyone.

      Reply
      • Data may not be controversial. But information and knowledge derived from data can be. One piece of data may not be controversial. When you have several collaborating may be.

        I remember several decades ago, I was given a tape of power usage from a house every 15 minutes. After some playing around, I know when the household gets up, make coffee, and cook supper, and how often their furnace turns on, and can correlate that with the atmospheric temperature, and can actually use how often the furnace was on on to deduce the outside temperature.

        Reply
        • As Uncle Ben said in Spiderman, “With great power comes great responsibility.” 🙂

          I completely agree with you Bill, and this is another example of the potential power of big data, and open data, depending on the data that’s released.

          This is absolutely one of the reasons why governments are very concerned about what exactly constitutes open data, and which of the data they have should be released, particularly as they must consider legal, health, privacy, and public safety impacts. So the question is, what data can potentially be used in a negative or harmful way, and which when released is likely to be used for positive and value add results?

          Obviously there are grey areas, but we avoid releasing the former data and aspire to release the latter. Then… there are also the heaps of data that are just kind of boring, benign, and likely only to clutter up an open data initiative and potentially make it more difficult to find the gems.

          It’s not as simple as just saying open everything. There is a lot of thought that needs to go into these initiatives…

          Reply
    • There’s a very clear reason why Chinese is injected into his article that even the author refers to: he’s learning the language.

      I think most would agree the best method for language learning is use, especially in situations where you’re not expected to.

      Reply
  3. Open Data sounds nifty but when talked about in this light for a simpleton such as myself, what sort of work are we talking about to make it accessible?

    It’s sort of like the “Cloud” when that became a popular tech buzzword. Everyone talked about this magical ethereal collection of storage but how it functioned made little sense.

    Who collects data? What method of standardization is there? What is the implication of having specific data portals? What real world applications might this have? Such as what platforms could this be used on? What platforms could be created? What sort of access might people have to it?

    It leaves little room for a troglodyte like myself.

    For instance you say: I think there are many potential benefits for Chengdu. We can use open data to tell the exciting story of our city and to promote economic activity and investment.

    How? Can open data also be used to spawn unicorns that shoot rainbow lasers out of their eyes? Apologies for the crisp sarcasm but when I noted the thread on the forum it was a subject whose implications fascinated me and this article seemed like icing on the cake but does little to clarify for the non-tech-savvy what applications this might functionally serve. Please enlighten me!

    Reply
    • We can spend an enormous amount of time talking about the possibilities for open data, along with the long list of questions, challenges, legal mumbo jumbo. For your first set of questions, almost all of them are “It depends on context.” I’m happy to discuss more at the meetup (see forum post).

      If you’re really interested in the subject, I’d recommend the open source and freely available “Beyond Transparency” that was published by Code for America about 4-5 months ago: http://beyondtransparency.org/ There are stories and case studies galore from the people who have led efforts in cities around the world.

      For real world examples of potential benefits in Chengdu:
      – Bus Route Mapping (my favorite): Take the geographic datasets for all the bus routes and create an app/map that will enable foreigners to better use this form of transit. Good for a few definitely, and also helps in terms of making Chengdu more international friendly, which could make it more attractive for investment
      – Commercial/Residential data visualization: Build data visualizations of Chengdu’s residential and commercial activity (potentially mapping it) to both show what’s going on and where people might be effective at starting new businesses. If it’s linked to open data, it can be updated as the data is updated. E.g., where should I locate the next Starbucks or iPhone factory in Chengdu – accessible and easy to understand information would be helpful.

      In the end… to me… it’s about enabling/empowering people who can enlighten you, me and others with their knowledge and skills.

      Reply
  4. Fascinating read, Bruce, and welcome to Chengdu! I first began geeking around with data after Hans Rosling made it so damn sexy with his 2007 Ted Talk.

    Can you post the links to the Open Houston Project? I was having a bit of trouble pinning it down and figure might as well ask the man himself.

    It seems to me that if the primary issue is machine readability, no sweat. I guess that suggests that you know where the numbers you’re interested are. If you’re alright with aggregating government info, this feels extremely possible and not even that tough, by which I mean not too many obstacles beyond the already difficult nature of the job. If you’re looking into numbers that the government isn’t publishing, that ought to become tricky. I imagine that distorted numbers are distorted for a reason. The pollution numbers for example bear no real political consequence in that the world over knows China has horrific air quality. No news. But if child mortality rates were way different than what the government was releasing or there was a disparity in spending trends… You get the picture. Someone would probably become interested in your earnest project.

    In trying to visualize what project you’d like to make, I think it would be valuable to see what you’ve made in the past. Possible?

    只要努力学就可以把汉语学会了。坚持下去!加油!

    Reply
      • Impeccable timing. I’m going to start planting Charlie prompts around the site to check out your efficiency. Joke (….)

        All of Rosling’s videos are excellent, but this one really started it off for me. He understands what it takes to translate data into a format that people can understand. Human limits seem to be bubbles and squares.

        Reply
        • Right, making the data understandable and usable is the real trick. I did some very small-time data gathering this time last year when everyone was freaking out about the pollution. My approach was completely manual, but it was linked to a lot as people around China (and around the world) were all talking about China’s pollution problem in Q1 of 2013. Here’s a link: Chengdu Air

          Reply
    • Hey Zak,

      I’m with you – I loved Hans’ visualizations. They’re still highlighted pretty frequently for both their content as well as the example they showcase in terms of format/presentation. I always enjoy ’em!

      In terms of Open Houston and its projects, they can be found in several places. The community organization can be found here: http://ohouston.org/ The beginnings of their work with the city can be found here: http://performance.houstontx.gov/

      The 311 Data Visualizations and Budget Bootcamp projects on the city site were both actually built through the open data hackathons. Link to 311: http://performance.houstontx.gov/311Dashboards Budget bootcamp link: http://performance.houstontx.gov/budgetbootcamp

      We’ve had links in a bunch of other places too, including some news articles, that showed some of the other projects that included several quality of life style apps. Let me know if you want me to search around for them.

      To your questions about machine readability, data aggregation, etc. All of it is “theoretically easy”, but generally it’s A LOT of detail oriented work if you want it done right… and A LOT of back and forth with whoever owns the data to make sure you understand it, have the right stuff, etc. Your points on data the gov wants/doesn’t want to release are well taken and accurate.

      For all the above issues (and more), if I were to do this again (especially from the community side with less resources, money, and my own data to make available), I’d want to start as absolutely simple as possible with just a couple datasets to make sure there’s both actual interest in doing something with data (if the community finds it interesting but doesn’t want to do anything with it – then not worth the effort), and that we can actually obtain and effectively load up a couple datasets.

      As a project from my perspective, that’s what I’m interested in doing: seeing if people care about this idea and would want to do it, seeing if we have some relevant data, and then seeing if anyone actually will use it. If we can get a handful of datasets and one good example project built with it, then we’ve got something to build off and potential credibility. If not, no worries… better that we pilot it first though.

      Final thing in terms of full disclosure: I’m not a developer and so I haven’t been building apps myself. Can I play around with data and do some damage in a SQL database? Hell yes, and I can also build some cool data visualizations in Excel or Tableau. Sadly, though, I’m not a programmer – I’m a data nerd, business analyst, and project manager. :-/

      Reply
  5. Hey there,

    Yeah I am also fascinated by data and the patterns and quirky little conclusions (as well as the standard useful ones) you can make by bringing different data sets together.

    For me the actual structure of any open data platform here in Chengdu, as well as the process which would lead to one, are most important. We talk as if all of this data were accessible and just needed a couple hands to “organize and interpret” – i think the reality is that this data is mostly in the hands of government offices, and in some instances government-connected (and condoned) NGOs.

    I can’t think of anything truly useful that wouldn’t require either 1) army of researchers or 2) excellent cooperation with the government. probably both.

    Data in *some* western societies is pretty easy to come by, and governments tend to want to help out. Here though, one example: traffic statistics are controlled by the Public Security Bureau, due to the sensitive nature of the number of deaths etc.

    And any real portal, useful or otherwise, would garner attention at some point or another.

    THIS DOES NOT HAVE TO BE A BAD THING.

    Anyway, the other idea would be to have bus routes and such, like a glorified Google Map, but it would be useful only to a small group of foreigners. We could add it to our already existing ChengduPlaces perhaps, and make that a more robust service for lost laowai.

    The wider audience, the Chinese audience, would roll their eyes at the idea that this information – or such services – don’t already exist in Chinese. Which they do.

    Like you said Bruce, context is the key.

    Reply
    • Pre-read summary: Blah blah blah, 什么什么什么. I rambled a bit… Sorry!

      Agree with your points about getting the data, although it doesn’t have to take too too much to get started. It requires finding the person out there who already geeks out on this and already knows where some of the datasets are already public available, AND/OR it takes a few people taking the time to find a handful of datasets that they find interesting. Option 2 has often taken the form of a hackathon or meetup where people spend a few hours or a day talking first about what would be interesting and then spending a few hours poking around to find and organize the data.

      I’d push back on the mention of data being easier (generally) to come by in western societies. They’re getting better as is the Chinese government, but most of these initiatives are just filled with so many issues that aren’t out in the open. There is still a TON to be done in even the standout open data initiatives in terms of quality of the data, accessibility of it, etc. We (in the west) are also very good at not releasing datasets we deem sensitive for all manner of legal and non-legal reasons.

      I could very well be very wrong about the above since I still need to better understand the situation here in China. I can say without hesitation, though, that the open data initiatives elsewhere (even the best) still have a lot to do. When I see articles written about China that reference how successful the west is… I just cringe a little since where we are successful in the west now is often still at a superficial level – at least broadly – although there are definitely big successes on specific projects, just as there are here.

      Anyway, perhaps we should start a separate forum topic on open data sets, and general quality of life topics, that would be interesting. Transit/transportation definitely seems to be up there for people…

      Reply
  6. Data sustainability, accuracy and reliability is always an issue with any data project. Open Data projects relies on the data collection, management and distribution agencies for all of these. Without direct control on these issues, Open Data projects should assess and publish these measures when publishing data for public use. One way to do it is to use multiple data source to obtain, not just the main dataset, but also surrogate datasets that can be used to verify the data accuracy. That’s why the more these kind of projects, the more vulnerable the data source agencies to be found as source of “inaccurate” or “unreliable” data, which is considered an opportunity to undermine the authority of the government.

    Reply
  7. Hi Bruce,

    I was born and raised in Chengdu. I am so excited and thankful that you are bringing open data to Chengdu. With the maker’s movement heating up in China and the new IBM’s smart city project landing in Chengdu, I think the government may approve and support it. I hope you the best luck and a wonderful time in Chengdu!

    Linna

    Reply

Leave a Comment