Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411616 Posts in 69390 Topics- by 58447 Members - Latest Member: sinsofsven

May 10, 2024, 10:31:55 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)General thread for quick questions
Pages: 1 ... 11 12 [13] 14 15 ... 69
Print
Author Topic: General thread for quick questions  (Read 136025 times)
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #240 on: September 13, 2015, 11:49:20 PM »

Jimmy - you are (now) reading the files just fine. It's failing to *print* them. Windows console support isn't so great for unicode.

https://wiki.python.org/moin/PrintFails should sort you out.

Or you could just write to files, rather than print, seeing as you've already got file reading/writing working.
Logged
Layl
Level 3
***

professional jerkface


View Profile WWW
« Reply #241 on: September 13, 2015, 11:51:10 PM »

nope

The problem has nothing to do with unicode, you just again forgot to escape the backslash.
"\U" is seen by python as a special thing which you can use to put unicode characters in a string. What you need is an actual backslash followed by a U, so: "\\U".
Logged
Dacke
Level 10
*****



View Profile
« Reply #242 on: September 13, 2015, 11:51:42 PM »

Quote
By default, the console in Microsoft Windows only displays 256 characters (cp437, of "Code page 437", the original IBM-PC 1981 extended ASCII character set.)

Holy shit  Facepalm
Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #243 on: September 14, 2015, 12:04:42 AM »

@layl
I use r"text" it automatically make it "raw text" and avoid this problem

@dacke
Surely, but I had no problem with blitz3D, who seems to work more like a pass through instead of bothering me with error.

I learn the horror of unicode, and this project has lend me to the horror of working with the data of someone else. I'll surely figure out in the long run. But it jeopardize a lot of things, I'll try to look for clue in the original file or various stuff.

I'll also need to see if I can safely isolate and discard the faulty data, might not be useful for my main original goal (although I had hope to go beyond it given the richness of the data).

Ambition thing that were secondary:
- using it as a limited translator mixing a system like pso1 and valve l4D/TF2 (point and name tagged objects) --> jeopardized
- analogy system by making generalization --> still possible on limited dataset
- instant quizz --> limited
- scribblenaut like creation with aided prompt (the system use the data structure to preselect the part to assembled)

That's not as ambitious it sound like, I was aiming for toy level of complexity, not solid AI like functionality.

That said they have an interface on python using different database, I have yet to mine the other database (I left json behind). I wanted to arange it into something I understand and not deal with their inconsistency FAIL XD

edit:
@Boris

Thanks I'm looking into it as soon as possible!
Logged

Dacke
Level 10
*****



View Profile
« Reply #244 on: September 14, 2015, 12:16:02 AM »

Surely, but I had no problem with blitz3D, who seems to work more like a pass through instead of bothering me with error.

I'm not exactly sure what you're referring to, but not getting error messages is horrible. If you want to ignore an exception in python, you have to say so explicitly (just like almost all other modern languages). Just having it fail silently leads to horrible bugs that can be almost impossible to find.

Code:
except:
    pass

edit: but please don't do this just to hide a bug you don't fully understand, that will mess things up royaly

edit 2: also, catching any and all exceptions will lead to problems. The accepted answer explains why and shows what to do:
http://stackoverflow.com/questions/730764/try-except-in-python-how-do-you-properly-ignore-exceptions

I learn the horror of unicode

Uuuuhhh... I do hope you mean "blessing". Everything before unicode and especially before utf8 was a friggin' mess. As demonstrated by the fact that the Windows console can't display the character ń (unicode character \u0144). And the fact that your program went to hell because you had a latin1 file, instead of relying on a universal standard (i.e. utf8).

I think you've gotten confused about what unicode is and how it works in Python. But I can't really tell what you believe so I can't really correct you, either.

If you give me a link to the file in question and tell me what data you want I can probably write a program for you in no time, if you feel too stuck
« Last Edit: September 14, 2015, 05:45:55 AM by Dacke » Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
Dacke
Level 10
*****



View Profile
« Reply #245 on: September 14, 2015, 12:25:33 AM »

Here is a nice high-level history and explanation from Tom Scott at Computerphile, UTF-8 and the unicode miracle:


Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #246 on: September 14, 2015, 11:27:11 PM »

Jimmy - you are (now) reading the files just fine. It's failing to *print* them. Windows console support isn't so great for unicode.

https://wiki.python.org/moin/PrintFails should sort you out.

Or you could just write to files, rather than print, seeing as you've already got file reading/writing working.

didn't work either

Code:
file = open(r"C:\Users\user\Documents\#1 ConceptNet Relations\dictionary.txt","r",encoding="utf-8")

unique_lines = set()

for line in file:
   unique_lines.add(line.strip())
unique_lines = list(unique_lines)

test_file = open("pyTest.txt", "wb")

for item in unique_lines:
    test_file.write("%s\n" % bytes(item, 'UTF-8'))

test_file.close()

I tried this and many variation of that

always got

Code:
C:\Python34\python.exe C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py
Traceback (most recent call last):
  File "C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py", line 28, in <module>
    test_file.write("%s\n" % bytes(item, 'UTF-8'))
TypeError: 'str' does not support the buffer interface

Process finished with exit code 1

It's not always str depending on the line of code

Internet is telling me that I must cast to byte which I did ...
« Last Edit: September 14, 2015, 11:44:44 PM by Jimym GIMBERT » Logged

Dacke
Level 10
*****



View Profile
« Reply #247 on: September 14, 2015, 11:42:20 PM »

What are you trying to achieve with all that raw/byte stuff?

I'm not sure if you're running into more Windows quirks, but just using UTF-8 in every step without specifying anything works perfectly for me:

Code:
file = open('utf8lines.txt')

unique_lines = set()

for line in file:
   unique_lines.add(line.strip())

outfile = open('unique-lines.txt', 'w')

for line in unique_lines:
   outfile.write('%s\n' % line)

Internet is telling me that I must cast to byte which I did ...

In certain cases you may need to. In this case... no. At least not on *nix, who knows with Windows.
« Last Edit: September 14, 2015, 11:51:25 PM by Dacke » Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #248 on: September 14, 2015, 11:51:20 PM »

 Cry

Code:
C:\Python34\python.exe C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py
Traceback (most recent call last):
  File "C:/Users/user/PycharmProjects/hellopython/ParseDictionaryToUnique.py", line 28, in <module>
    test_file.write("%s\n" % item)
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u015b' in position 7: character maps to <undefined>

Process finished with exit code 1

Your code (almost) verbatim, I tried that but had "wb" instead of "w" (I mean previously)
Logged

Dacke
Level 10
*****



View Profile
« Reply #249 on: September 14, 2015, 11:53:29 PM »

Well remove the "b", then!

You're dealing with utf8, not bytes. Every character can be anything from 1 to 6 bytes. So you can't treat an utf8 string as single bytes representing single characters.
Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #250 on: September 15, 2015, 12:09:25 AM »

I did it, I explain my self incorrectly, I tried this format earlier with b, Looking at your code I removed the cast (again) and the b to see if there was a difference, there was none, it's either buffer or codec error Sad. I'm already looking for other solution but now I'm kinda aimless.
Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #251 on: September 15, 2015, 12:14:17 AM »

I also notice that re running the program and looking at the target txt ... it doesn't reset the position (certainly for existing early without closing the file) DAMN it's a minefield
Logged

Dacke
Level 10
*****



View Profile
« Reply #252 on: September 15, 2015, 12:22:43 AM »

If the code I gave to you doesn't work on your computer then I can't really help you. I have no issues with the character ś (u\015b) which you get an UnicodeEncodeError for.

I assume it's more Windows BS. Maybe the default encoding is set to something weird, so perhaps you can explicitly set it to UTF-8 in every step?

I do remember Windows being generally finicky when it comes to this kind of programming. Bad file handling, weird encoding problems, etc. Maybe someone who uses Windows can help you out.
Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #253 on: September 15, 2015, 08:31:42 AM »

I already did try the cast to UTF8

However, currently I have a set of files, coded with semantic relation, in the form of each text file being the relation and each line being tab separated argument ; and argument could be 1 to n elements comma separated (","), it would be <name of file><argument 1><tab><argument 2><end of line>.

Example, let's say you have the relation "fox > is a > mammal" ie isA(fox, mammal)
it would be file = isA, line = fox [tab] mammal.

The bad thing is that If I could use python and set, i would be able to put arg1 into a set, arg2 in another, and extract all roots (argument only exist in arg2) and leaves (argument only exist in arg1) in a single operations

If anyone is interested I made the files available here:
https://dl.dropboxusercontent.com/u/24530447/divers/%231%20ConceptNet%20Relations.rar

I'll try later different take.
Logged

Dacke
Level 10
*****



View Profile
« Reply #254 on: September 15, 2015, 08:52:21 AM »

I didn't quite follow.

I understand the format of the files. They're pretty darn silly, though.

Desires.txt
Code:
america	bunch_of_war
ant find_picnic_basket
beer respect
child bounce_tennis_and_golf_ball
child_might eat_banana

But what is it you're trying to do with this data?
Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #255 on: September 15, 2015, 09:09:08 AM »

I'm toying with, the format is just that I was parsing a bigger set of data (see previous meldtdown with python) to toy with them and see what I could do with. It's used for semantic analysis and various stuff. I'm sure there is some goofiness in the data that is expected as it compiled from different source, one source is human computation game (think capcha for semantic relation).

http://conceptnet5.media.mit.edu/
Quote
Sources and how to contribute
Previous versions of ConceptNet were a home-grown crowd-sourced project, where we ran a Web site collecting facts from people who came to the site. The Web of Data is much bigger than that now. Our data comes from many different sources, many of which you can contribute to and improve not just the state of computational knowledge, but of human knowledge.

To begin with, ConceptNet 5 contains almost all the data from ConceptNet 4, created by contributors to the Open Mind Common Sense project.

We connect to a subset of DBPedia, which extracts knowledge from the infoboxes on Wikipedia articles.

Much of our knowledge comes from Wiktionary, the free multilingual dictionary, a sister project to Wikipedia. This gives us information about synonyms, antonyms, translations of concepts into hundreds of languages, and multiple labeled word senses for many words.

More dictionary-style knowledge comes from WordNet.

UMBEL connects ConceptNet to the OpenCyc ontology via a Semantic Web representation.

Some knowledge about people's intuitive word associations comes from "games with a purpose". We learn things in English from the GWAP project's word game Verbosity, and in Japanese from nadya.jp. Silly input generator spotted

ConceptNet supports linked data: you can download a list of links to the greater Semantic Web, via DBPedia, UMBEL, and RDF/OWL WordNet. For example, our concept cat is linked to the DBPedia node at http://dbpedia.org/resource/Cat.

The idea is to see if I can feed it to various procedural text generation, very simple semantic analysis and simple translation system.

EDIT:
think "scribblenaut"
Logged

Dacke
Level 10
*****



View Profile
« Reply #256 on: September 15, 2015, 09:11:05 AM »

I'm asking what you're trying to do at the moment. The piece of code that you can't get working. You want to find all unique values while discarding the connections? Or what?

This is the part I'm asking about:

The bad thing is that If I could use python and set, i would be able to put arg1 into a set, arg2 in another, and extract all roots (argument only exist in arg2) and leaves (argument only exist in arg1) in a single operations
« Last Edit: September 15, 2015, 09:28:34 AM by Dacke » Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #257 on: September 15, 2015, 04:56:39 PM »

mmm sorry!

I lost the update when I edited my post...
I didn't share the file I was working on, I shared the relation files.

This is what I'm working on now https://dl.dropboxusercontent.com/u/24530447/divers/dictionary.rar

It's all the arguments of all the files on each line.
- I want to strip double (using set and saving the set back into the file)



Then using the knowledge from it, manipulate some relation (the quote is about "IsA" file) to find roots and leaves and store them in another file (when structure is hierarchical).

I'm going through step.
Later I would need to replace all argument's elements by their indexes in the dictionay file. If possible by then I would have a code to "recompile" the data from scratch (using the database) to expend to all language and future version of conceptnet.
Logged

valrus
Level 3
***


View Profile
« Reply #258 on: September 15, 2015, 09:27:20 PM »

In Python 3.x, try using the optional encoding argument of open; that usually gets me around all the trouble I used to have in 2.x with Unicode encoding errors.

Code:
with open('utf8lines.txt','r', encoding="utf-8") as infile:
    with open('unique-lines.txt', 'w', encoding="utf-8") as outfile:
        unique_lines = set(line.strip() for line in infile)
        outfile.write("\n".join(unique_lines))

If you're in 2.x, you can import this version of open from __future__
Logged
gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #259 on: September 15, 2015, 09:58:00 PM »

if you are talking about that
Quote
open('utf8lines.txt','r', encoding="utf-8")
Yeah I have done that, I'll try to copy paste your code as verification though
Logged

Pages: 1 ... 11 12 [13] 14 15 ... 69
Print
Jump to:  

Theme orange-lt created by panic