Jimym, if I've understood you, you are considering reading certain fields without parsing the entire file by writing your own parser. You cannot do this in JSON. You cannot do this in XML. If you attempt to do it, it'll probably break in some circumstances.
Why bother? Parsing JSON is pretty fast. And once parsed, all these manipulations are easy.
Also, why bother checking the keys. The
documentation literally tells you all the ones of interest.
The documentation is not as thorough as it seems, it's not reliable enough, the order of data don't match, some field are sometimes omitted randomly in practice, and argument may have multiple proposition which is not mention. There is also redundancy that threw out the parsing, especially with multiple proposition. It doesn't exhaustively list all type of data ex: edge type and most notably beyond the 4th parameter of concepts which is important to me, I need to reconstruct a list of those parameter too.
That's why I freak out initially.
I'm also not a real programmer with real training, basic 1st year stuff like parsing, sorting, database, etc ... thing that just need rot learning of basic pattern ... that's what is hard to me. I'm just educated on programming because I like it (I'm a designer first though)
I found out how to access the meat after running all night a series of checks in the sample data.
ie: ignore the first element, get the 2nd to 5th elements, ignore beyond where the instability happen (assuming it's generalizable to all data)
Problem: I leave out surface text because it's harder to parse and in the unstable area
but it's important too
Regarding the Json xml, that's not it, I want to cycle freely through the structure to be able to compare consistency, if possible without having to write a full parser (ie using existing libraries), the idea is that json should be more consistent that the csv they give, because field would be null instead omitted and labelled, they are also likely to retain the same structure across all data, which put the burden on the value structure instead of the whole data.
edit:
1 data line
/a/[/r/Antonym/,/c/en/abdomen/,/c/en/torso/] /r/Antonym /c/en/abdomen /c/en/torso /ctx/all 0.014355292977070055 /s/site/verbosity /e/0d01785baa93174b791991457c86f313a18054b8 /d/verbosity [[abdomen]] is not [[torso]]
CSV parsed and annotated (manually green = desirable)
/a/[/r/Antonym/,/c/en/abdomen/,/c/en/torso/] uri
/r/Antonym relation
/c/en/abdomen parameter 1
/c/en/torso parameter 2
/ctx/all context
0.014355292977070055 weight
/s/site/verbosity source (can be complex with nested data and /or/ and /and/ operator)
/e/0d01785baa93174b791991457c86f313a18054b8 id
/d/verbosity dataset
[[abdomen]] is not [[torso]] surface text
features?
licenses?
Also I'm parsing 5.3 now they have moved to 5.4
edit:
you can also have multiple data line for the same concept with minor variation (surface text for example)
Json sample
{
"context": "/ctx/all",
"dataset": "/d/conceptnet/4/en",
"end": "/c/en/drink_water",
"features": [
"/c/en/cat /r/CapableOf -",
"/c/en/cat - /c/en/drink_water",
"- /r/CapableOf /c/en/drink_water"
],
"id": "/e/54e238cbf42cb02560abee949ff39ce8eeafde92",
"license": "/l/CC/By-SA",
"rel": "/r/CapableOf",
"source_uri": "/or/[/and/[/s/activity/omcs/omcs1_possibly_free_text/,/s/contributor/omcs/clburke/]/,/and/[/s/activity/omcs/omcs1_possibly_free_text/,/s/contributor/omcs/cralize/]/,/and/[/s/activity/omcs/omcs1_possibly_free_text/,/s/contributor/omcs/meganraby/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/chaizzilla/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/craleb/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/dragonjools/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/kurt_woloch/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/leighman/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/logjac/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/mcandag1/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/rossjesse/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/rspeer/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/scarfboy/]/]",
"sources": [
"/s/activity/omcs/omcs1_possibly_free_text",
"/s/activity/omcs/vote",
"/s/contributor/omcs/chaizzilla",
"/s/contributor/omcs/clburke",
"/s/contributor/omcs/craleb",
"/s/contributor/omcs/cralize",
"/s/contributor/omcs/dragonjools",
"/s/contributor/omcs/kurt_woloch",
"/s/contributor/omcs/leighman",
"/s/contributor/omcs/logjac",
"/s/contributor/omcs/mcandag1",
"/s/contributor/omcs/meganraby",
"/s/contributor/omcs/rossjesse",
"/s/contributor/omcs/rspeer",
"/s/contributor/omcs/scarfboy"
],
"start": "/c/en/cat",
"surfaceEnd": "drink water",
"surfaceStart": "Cats",
"surfaceText": "[[Cats]] can [[drink water]]",
"uri": "/a/[/r/CapableOf/,/c/en/cat/,/c/en/drink_water/]",
"weight": 4.523561956057013
},
Notice it use a weird order (alphabetical not conceptual closeness) and they shift the naming completely for whatever reason, so you have to guess which is what, but I don't need to parse surface text anymore!